Re: 2 Windows 2003 clients with huge # of files consistently failing
John, We use Verint Ultra here as well. It is a pain, but it can be managed with TSM. You just have to use every tool available in TSM - journal based backups, disk-based memory efficient option, etc. Here is the dsm.opt file that we use, along with some sample dsmsched.log outputs from non-journaled and journaled backups. Note the exclusion of the F: drive - we discovered that the actual database files are covered by our SQL Server backups, and everything on that drive was causing backup problems, so the Verint engineers suggested excluding the entire drive. Good luck, Steve Schaub Systems Engineer, Windows BlueCross BlueShield of Tennessee steve_sch...@bcbst.com -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of John C Dury Sent: Friday, January 16, 2009 8:45 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently failing Basically the app is recording all calls into our customer service reps (CSR) via WAV and AVI (for screen activity). The files are all very small but there are alot of them and because there are about 100 CSRs, they create a new directory for each call. As an example, on Jan 14 2009, there were 592 AVI and 592 WAV files but they live in in 4043 directories. Ridiculous! Unfortunately as per regulations, we need to keep the call data. I've contact Verint (maker of Ultra, the app) to see if they have alternatives to how they're data is stored. I can't imagine ever having to restore this server, ever. I'll go back and start researching image backups again, although I couldn't get them to work the first time. It backed up about 12G and then just hung and never progressed any further. No errors anywhere I looked (actlog,event viewer,tsmerror.log etc). I also thought about possibly using Tivoli Continuos Data Protection (CDP). Think that is an option? Thanks for all your help and ideas, John ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) If TSM is struggling to get through the directories, then applications associated with the data may be suffering the same problem. This may be the result of indifferent directory layout (far too many files in directories) or disk hardware issue or contention or file system issue (where chkdsk or equivalent might be run). The hardware may simply be underpowered for the amount of data involved (e.g., 5400 rpm disks or perhaps older ATA pathing). Or the file system type may be an inefficient choice. Large-scale data deployments cry out for a knowledgeable data architect in order to be successful and to scale - and that skill is often absent. The owners of the data should be strongly advised to regard the backup problem as a proportional indication of how very painful a file- oriented restoral would be, where reconstructing Windows directory entries is notoriously time-consuming. Richard Sims I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. are you sure that the journal is running and has enough space? In these cases, having the journals on a separate filesystem might be a very good idea. I have the feeling that there is not enough space for the TSM journal database... I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. and to what type of storage do these images go? I'd think that in case of an image backup you'd want a management class that makes them go directly to tape. My guess is that these were going to disk volumes? What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. I'd say that image backups are a good idea in cases of very active filesystems. Filesystems on windows with huge numbers of files are always a cause of problems, not only with TSM. And to make it even weirder, they both fail incrementals at night and the only error I can find
Re: 2 Windows 2003 clients with huge # of files consistently failing
Basically the app is recording all calls into our customer service reps (CSR) via WAV and AVI (for screen activity). The files are all very small but there are alot of them and because there are about 100 CSRs, they create a new directory for each call. As an example, on Jan 14 2009, there were 592 AVI and 592 WAV files but they live in in 4043 directories. Ridiculous! Unfortunately as per regulations, we need to keep the call data. I've contact Verint (maker of Ultra, the app) to see if they have alternatives to how they're data is stored. I can't imagine ever having to restore this server, ever. I'll go back and start researching image backups again, although I couldn't get them to work the first time. It backed up about 12G and then just hung and never progressed any further. No errors anywhere I looked (actlog,event viewer,tsmerror.log etc). I also thought about possibly using Tivoli Continuos Data Protection (CDP). Think that is an option? Thanks for all your help and ideas, John ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) If TSM is struggling to get through the directories, then applications associated with the data may be suffering the same problem. This may be the result of indifferent directory layout (far too many files in directories) or disk hardware issue or contention or file system issue (where chkdsk or equivalent might be run). The hardware may simply be underpowered for the amount of data involved (e.g., 5400 rpm disks or perhaps older ATA pathing). Or the file system type may be an inefficient choice. Large-scale data deployments cry out for a knowledgeable data architect in order to be successful and to scale - and that skill is often absent. The owners of the data should be strongly advised to regard the backup problem as a proportional indication of how very painful a file- oriented restoral would be, where reconstructing Windows directory entries is notoriously time-consuming. Richard Sims I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. are you sure that the journal is running and has enough space? In these cases, having the journals on a separate filesystem might be a very good idea. I have the feeling that there is not enough space for the TSM journal database... I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. and to what type of storage do these images go? I'd think that in case of an image backup you'd want a management class that makes them go directly to tape. My guess is that these were going to disk volumes? What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. I'd say that image backups are a good idea in cases of very active filesystems. Filesystems on windows with huge numbers of files are always a cause of problems, not only with TSM. And to make it even weirder, they both fail incrementals at night and the only error I can find is: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) meaning that indeed the client is indeed choking on the size of the directories. I'm starting to think that TSM is just not the backup solution for either of these boxes. I'm also thinking that if you have a piece of software creating 1000's of files per day in a filesystem, that this is a very big workload. I'm very sure that with VSS snapshots and image backups, you are on the right track and no other product could do a better job of backing up these filesystems.
Re: 2 Windows 2003 clients with huge # of files consistently failing
John, Perhaps you could get creative here... Assuming there is a way of grouping files by date... Let's say all files are saved to: D:\20090114\... D:\20090115\... D:\20090116\... Etc.. You could at the cost of some disk space do the following: 1. Run a batch/script before the backup window which tars/zips that day's files and saves them to some backup directory, and also cleans up old tars/zips in the backup directory 2. Exclude the directories where your files reside 2. Let the backup run and backup the backup directory Kind of treat it like you would dumping a db to a flat file so it can be backed up. Then the restore will be such that you restore the backup directory and then have an additional step to unzip/untar. Hope this helps to give you another way of looking at the problem. Justin -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of John C Dury Sent: Friday, January 16, 2009 8:45 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently failing Basically the app is recording all calls into our customer service reps (CSR) via WAV and AVI (for screen activity). The files are all very small but there are alot of them and because there are about 100 CSRs, they create a new directory for each call. As an example, on Jan 14 2009, there were 592 AVI and 592 WAV files but they live in in 4043 directories. Ridiculous! Unfortunately as per regulations, we need to keep the call data. I've contact Verint (maker of Ultra, the app) to see if they have alternatives to how they're data is stored. I can't imagine ever having to restore this server, ever. I'll go back and start researching image backups again, although I couldn't get them to work the first time. It backed up about 12G and then just hung and never progressed any further. No errors anywhere I looked (actlog,event viewer,tsmerror.log etc). I also thought about possibly using Tivoli Continuos Data Protection (CDP). Think that is an option? Thanks for all your help and ideas, John ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) If TSM is struggling to get through the directories, then applications associated with the data may be suffering the same problem. This may be the result of indifferent directory layout (far too many files in directories) or disk hardware issue or contention or file system issue (where chkdsk or equivalent might be run). The hardware may simply be underpowered for the amount of data involved (e.g., 5400 rpm disks or perhaps older ATA pathing). Or the file system type may be an inefficient choice. Large-scale data deployments cry out for a knowledgeable data architect in order to be successful and to scale - and that skill is often absent. The owners of the data should be strongly advised to regard the backup problem as a proportional indication of how very painful a file- oriented restoral would be, where reconstructing Windows directory entries is notoriously time-consuming. Richard Sims I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. are you sure that the journal is running and has enough space? In these cases, having the journals on a separate filesystem might be a very good idea. I have the feeling that there is not enough space for the TSM journal database... I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. and to what type of storage do these images go? I'd think that in case of an image backup you'd want a management class that makes them go directly to tape. My guess is that these were going to disk volumes? What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. I'd say that image backups are a good idea in cases of very active filesystems. Filesystems on windows with huge numbers of files are always
Re: 2 Windows 2003 clients with huge # of files consistently failing
John - In a logging-files environment such as you cite, where the data is largely write-only and historic, a hierarchical storage approach would be a more reasonable thing, where data over like a week old would migrate to a cheaper, lower level mass storage area whose entirety would not have to be regularly scanned for backup. (It's easy to incite an Incremental backup on just the data identified by the migration task.) Recent data would be held in a higher level area of much smaller size, whose performance would meet the needs of the application and be much more reasonable to scan for backup. We TSM administrators often end up the victims of data architectures which weren't thought out for all aspects of their management (in our case, Backup and Restore), and we aren't in a position to re-engineer the layout. If the new data is in some way identifiable by name or timestamp in the directory name, or by identification in some application logging, it might be possible to focus Incremental backups on that subset of the file system, rather than Incremental over the whole thing. Beyond that, Image or CDP backups may be worth pursuing further. Also have the organization consider whether just mirroring will meet recovery needs: in some implementations, conventional file backups may not be necessary. Richard Sims
Re: 2 Windows 2003 clients with huge # of files consistently failing
I guess I have different questions than I have already seen. How is the memory usage on the node? Is it 32 or 64 bit? Is the agent hanging or working during the 9000 seconds? We have 2 systems that back 10M+ objects in an equally crazy number of directories and we have not seen the error you have. 1 machine is 32 bit and is using memory efficient disk cache and the other one is 64 bit and does not needs memory help. Both have static file systems during the backup. (That is another story.) We have seen a similar error to yours on 2003 servers with SP2. Take a look at Microsoft KB 948496 if the server is running SP2. Andy Huebner -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of John C Dury Sent: Thursday, January 15, 2009 8:27 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] 2 Windows 2003 clients with huge # of files consistently failing I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. And to make it even weirder, they both fail incrementals at night and the only error I can find is: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) I'm starting to think that TSM is just not the backup solution for either of these boxes. This e-mail (including any attachments) is confidential and may be legally privileged. If you are not an intended recipient or an authorized representative of an intended recipient, you are prohibited from using, copying or distributing the information in this e-mail or its attachments. If you have received this e-mail in error, please notify the sender immediately by return e-mail and delete all copies of this message and any attachments. Thank you.
Re: 2 Windows 2003 clients with huge # of files consistently failing
I also have 2 problem child's like these. I turn on journaling, but the journaling ran out of memory (virtual). My only other option was to turn on memory efficient disk cache method to get thru the memory issue with journaling. -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of John C Dury Sent: Thursday, January 15, 2009 6:27 AM To: ADSM-L@VM.MARIST.EDU Subject: 2 Windows 2003 clients with huge # of files consistently failing I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. And to make it even weirder, they both fail incrementals at night and the only error I can find is: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) I'm starting to think that TSM is just not the backup solution for either of these boxes.
2 Windows 2003 clients with huge # of files consistently failing
I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. And to make it even weirder, they both fail incrementals at night and the only error I can find is: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) I'm starting to think that TSM is just not the backup solution for either of these boxes.
Re: 2 Windows 2003 clients with huge # of files consistently failing
On 15 jan 2009, at 15:26, John C Dury wrote: I have two separate Windows 2003 boxes both running running v5.5.1.10 client that are both failing their incrementals every night. Both of these boxes have hundreds of thousand of files all spread into multiple directories. In fact, each day, a new directory is created and then multiple subdirectories are created under it and thousand of files in each of those subdirectories. The reason I say this is because I don't think it is a candidate for multiple virtual nodes because of the new directories that are created every day. I do have journaling turned on although it doesn't seem to help with the large number of files either as when I run an incremental manually,it takes forever and never seems to finish. are you sure that the journal is running and has enough space? In these cases, having the journals on a separate filesystem might be a very good idea. I have the feeling that there is not enough space for the TSM journal database... I thought about doing image backups of the drive where the thousands of files live but when I tried it, it backed up about 14g and then just hung and never continued. I had to cancel it after waiting for an hour or so. and to what type of storage do these images go? I'd think that in case of an image backup you'd want a management class that makes them go directly to tape. My guess is that these were going to disk volumes? What is my best strategy for dealing with these two boxes that are generating thousands of new files in new directories every day? The huge number of objects in the TSM DB are starting to cause quite a few problems with daily processing also as expiration is running longer and longer since I think it is choking on the number of objects. I'd say that image backups are a good idea in cases of very active filesystems. Filesystems on windows with huge numbers of files are always a cause of problems, not only with TSM. And to make it even weirder, they both fail incrementals at night and the only error I can find is: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) meaning that indeed the client is indeed choking on the size of the directories. I'm starting to think that TSM is just not the backup solution for either of these boxes. I'm also thinking that if you have a piece of software creating 1000's of files per day in a filesystem, that this is a very big workload. I'm very sure that with VSS snapshots and image backups, you are on the right track and no other product could do a better job of backing up these filesystems. -- Remco Post r.p...@plcs.nl +31 6 24821 622
Re: 2 Windows 2003 clients with huge # of files consistently failing
On Jan 15, 2009, at 9:26 AM, John C Dury wrote: ANR0481W Session 16603 for node SERVERNAME (WinNT) terminated - client did not respond within 9000 seconds. (SESSION: 16603) If TSM is struggling to get through the directories, then applications associated with the data may be suffering the same problem. This may be the result of indifferent directory layout (far too many files in directories) or disk hardware issue or contention or file system issue (where chkdsk or equivalent might be run). The hardware may simply be underpowered for the amount of data involved (e.g., 5400 rpm disks or perhaps older ATA pathing). Or the file system type may be an inefficient choice. Large-scale data deployments cry out for a knowledgeable data architect in order to be successful and to scale - and that skill is often absent. The owners of the data should be strongly advised to regard the backup problem as a proportional indication of how very painful a file- oriented restoral would be, where reconstructing Windows directory entries is notoriously time-consuming. Richard Sims