Re: Deduplication
Hi Stefan! I have it set to 0, but like Dave mentioned, you have to wait a few hours. The reason it wasn't working as expected in my case was that another user backed up the exact same directory too. So I had to delete not only my backup files, but also his files. It's a test environment by the way. :-) Kind regards, Eric van Loon Air France/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Stefan Folkerts Sent: woensdag 12 april 2017 17:56 To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication Eric, The containerpool has a reuse delay setting in day's that, in effect, works the same as the reuse delay on traditional storagepools, did you set this to 0? It's in day's not hours and the default is 1. Regards, Stefan On Tue, Apr 11, 2017 at 2:21 PM, Loon, Eric van (ITOPT3) - KLM < eric-van.l...@klm.com> wrote: > Hi Dave! > Thank you very much for your reply! > I deleted some data this morning and waited for 4 hours before making > a new backup, but that doesn't seem to be enough. Is there any way to > influence this waiting period? A certain table reorg or stop/start of > the server or is it just hard-coded? > Thanks again for your help! > Kind regards, > Eric van Loon > Air France/KLM Storage Engineering > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf > Of Del Hoobler > Sent: maandag 10 april 2017 18:12 > To: ADSM-L@VM.MARIST.EDU > Subject: Re: Deduplication > > Hi Eric, > > A few things: > > - Client-side provides better overall throughput for Spectrum Protect > because the deduplication is spread across more CPU's. So if you can > afford to do the deduplication client-side, that is the best overall result. > > - Client-side helps reduce network traffic > > - The algorithms on how deduplication is performed are the same > between client and server. > > > The behavior you are seeing has to do with the reusedelay impact on > deduplicated chunks. If the reusedelay is 1 day (default), that means > Spectrum Protect keeps the deduplicated chunks pinned in storage until > that time has passed. If the reusedelay is 0, there is a still a small > cushion window that might allow the chunks to still be linked to. If > you waited for a couple of hours AFTER the deletion occurred, I would > not expect those chunks to be reused. > > > > Del > > > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 04/10/2017 > 10:57:27 AM: > > > From: "Loon, Eric van (ITOPT3) - KLM" <eric-van.l...@klm.com> > > To: ADSM-L@VM.MARIST.EDU > > Date: 04/10/2017 11:01 AM > > Subject: Deduplication > > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > > Hi guys! > > We are trying to make a fair comparison between server- and client- > > side deduplication. I'm running into an 'issue' where I notice that > > once you created a backup of a certain set of data, it is always > > deduplicated 100% afterwards when you start a new client-side > > deduped backup. Even when you delete all previous backup on the > > server > first! > > So I backed up a directory, retrieved all objectids through a select > > * from backups and deleted all objects, but still a new backup is > > deduplicated 100%. I don't understand why. I though it maybe had > > something to do with data still being in the container pool, but > > even with reusdelay=0, everything is deduplicated... > > Thanks for any help (Andy? :)) in advance. > > Kind regards, > > Eric van Loon > > Air France/KLM Storage Engineering > > > > For information, services and offers, please visit our web site: > > http://www.klm.com. This e-mail and any attachment may contain > > confidential and privileged material intended for the addressee > > only. If you are not the addressee, you are notified that no part of > > the e-mail or any attachment may be disclosed, copied or > > distributed, and that any other action related to this e-mail or > > attachment is strictly prohibited, and may be unlawful. If you have > > received this e-mail by error, please notify the sender immediately > > by return e-mail, and delete this message. > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/ > > or its employees shall not be liable for the incorrect or incomplete > > transmission of this e-mail or any attachments, nor responsible for > > any delay in receipt. > > Konink
Re: Deduplication
Eric, The containerpool has a reuse delay setting in day's that, in effect, works the same as the reuse delay on traditional storagepools, did you set this to 0? It's in day's not hours and the default is 1. Regards, Stefan On Tue, Apr 11, 2017 at 2:21 PM, Loon, Eric van (ITOPT3) - KLM < eric-van.l...@klm.com> wrote: > Hi Dave! > Thank you very much for your reply! > I deleted some data this morning and waited for 4 hours before making a > new backup, but that doesn't seem to be enough. Is there any way to > influence this waiting period? A certain table reorg or stop/start of the > server or is it just hard-coded? > Thanks again for your help! > Kind regards, > Eric van Loon > Air France/KLM Storage Engineering > > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Del Hoobler > Sent: maandag 10 april 2017 18:12 > To: ADSM-L@VM.MARIST.EDU > Subject: Re: Deduplication > > Hi Eric, > > A few things: > > - Client-side provides better overall throughput for Spectrum Protect > because the deduplication is spread across more CPU's. So if you can afford > to do the deduplication client-side, that is the best overall result. > > - Client-side helps reduce network traffic > > - The algorithms on how deduplication is performed are the same between > client and server. > > > The behavior you are seeing has to do with the reusedelay impact on > deduplicated chunks. If the reusedelay is 1 day (default), that means > Spectrum Protect keeps the deduplicated chunks pinned in storage until > that time has passed. If the reusedelay is 0, there is a still a small > cushion window that might allow the chunks to still be linked to. If you > waited for a couple of hours AFTER the deletion occurred, I would not > expect those chunks to be reused. > > > > Del > > > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 04/10/2017 > 10:57:27 AM: > > > From: "Loon, Eric van (ITOPT3) - KLM" <eric-van.l...@klm.com> > > To: ADSM-L@VM.MARIST.EDU > > Date: 04/10/2017 11:01 AM > > Subject: Deduplication > > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > > > Hi guys! > > We are trying to make a fair comparison between server- and client- > > side deduplication. I'm running into an 'issue' where I notice that > > once you created a backup of a certain set of data, it is always > > deduplicated 100% afterwards when you start a new client-side > > deduped backup. Even when you delete all previous backup on the server > first! > > So I backed up a directory, retrieved all objectids through a select > > * from backups and deleted all objects, but still a new backup is > > deduplicated 100%. I don't understand why. I though it maybe had > > something to do with data still being in the container pool, but > > even with reusdelay=0, everything is deduplicated... > > Thanks for any help (Andy? :)) in advance. > > Kind regards, > > Eric van Loon > > Air France/KLM Storage Engineering > > > > For information, services and offers, please visit our web site: > > http://www.klm.com. This e-mail and any attachment may contain > > confidential and privileged material intended for the addressee > > only. If you are not the addressee, you are notified that no part of > > the e-mail or any attachment may be disclosed, copied or > > distributed, and that any other action related to this e-mail or > > attachment is strictly prohibited, and may be unlawful. If you have > > received this e-mail by error, please notify the sender immediately > > by return e-mail, and delete this message. > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/ > > or its employees shall not be liable for the incorrect or incomplete > > transmission of this e-mail or any attachments, nor responsible for > > any delay in receipt. > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > > Dutch Airlines) is registered in Amstelveen, The Netherlands, with > > registered number 33014286 > > > > > > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee only. If > you are not the addressee, you are notified that no part of the e-mail or >
Re: Deduplication
Hi Dave! Thank you very much for your reply! I deleted some data this morning and waited for 4 hours before making a new backup, but that doesn't seem to be enough. Is there any way to influence this waiting period? A certain table reorg or stop/start of the server or is it just hard-coded? Thanks again for your help! Kind regards, Eric van Loon Air France/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Del Hoobler Sent: maandag 10 april 2017 18:12 To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication Hi Eric, A few things: - Client-side provides better overall throughput for Spectrum Protect because the deduplication is spread across more CPU's. So if you can afford to do the deduplication client-side, that is the best overall result. - Client-side helps reduce network traffic - The algorithms on how deduplication is performed are the same between client and server. The behavior you are seeing has to do with the reusedelay impact on deduplicated chunks. If the reusedelay is 1 day (default), that means Spectrum Protect keeps the deduplicated chunks pinned in storage until that time has passed. If the reusedelay is 0, there is a still a small cushion window that might allow the chunks to still be linked to. If you waited for a couple of hours AFTER the deletion occurred, I would not expect those chunks to be reused. Del "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 04/10/2017 10:57:27 AM: > From: "Loon, Eric van (ITOPT3) - KLM" <eric-van.l...@klm.com> > To: ADSM-L@VM.MARIST.EDU > Date: 04/10/2017 11:01 AM > Subject: Deduplication > Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> > > Hi guys! > We are trying to make a fair comparison between server- and client- > side deduplication. I'm running into an 'issue' where I notice that > once you created a backup of a certain set of data, it is always > deduplicated 100% afterwards when you start a new client-side > deduped backup. Even when you delete all previous backup on the server first! > So I backed up a directory, retrieved all objectids through a select > * from backups and deleted all objects, but still a new backup is > deduplicated 100%. I don't understand why. I though it maybe had > something to do with data still being in the container pool, but > even with reusdelay=0, everything is deduplicated... > Thanks for any help (Andy? :)) in advance. > Kind regards, > Eric van Loon > Air France/KLM Storage Engineering > > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee > only. If you are not the addressee, you are notified that no part of > the e-mail or any attachment may be disclosed, copied or > distributed, and that any other action related to this e-mail or > attachment is strictly prohibited, and may be unlawful. If you have > received this e-mail by error, please notify the sender immediately > by return e-mail, and delete this message. > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/ > or its employees shall not be liable for the incorrect or incomplete > transmission of this e-mail or any attachments, nor responsible for > any delay in receipt. > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > Dutch Airlines) is registered in Amstelveen, The Netherlands, with > registered number 33014286 > > For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication
Hi Steve! I would hope not, because in that case TSM would have a data integrity issue... Kind regards, Eric van Loon Air France/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Schaub, Steve Sent: maandag 10 april 2017 17:30 To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication Perhaps the client side dedupe is keeping a dedupe hash-bitmap that is not getting fully refreshed when you purge the backup data from the server? -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, Eric van (ITOPT3) - KLM Sent: Monday, April 10, 2017 10:57 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication Hi guys! We are trying to make a fair comparison between server- and client-side deduplication. I'm running into an 'issue' where I notice that once you created a backup of a certain set of data, it is always deduplicated 100% afterwards when you start a new client-side deduped backup. Even when you delete all previous backup on the server first! So I backed up a directory, retrieved all objectids through a select * from backups and deleted all objects, but still a new backup is deduplicated 100%. I don't understand why. I though it maybe had something to do with data still being in the container pool, but even with reusdelay=0, everything is deduplicated... Thanks for any help (Andy? :)) in advance. Kind regards, Eric van Loon Air France/KLM Storage Engineering For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- Please see the following link for the BlueCross BlueShield of Tennessee E-mail disclaimer: http://www.bcbst.com/email_disclaimer.shtm For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication
Hi Eric, A few things: - Client-side provides better overall throughput for Spectrum Protect because the deduplication is spread across more CPU's. So if you can afford to do the deduplication client-side, that is the best overall result. - Client-side helps reduce network traffic - The algorithms on how deduplication is performed are the same between client and server. The behavior you are seeing has to do with the reusedelay impact on deduplicated chunks. If the reusedelay is 1 day (default), that means Spectrum Protect keeps the deduplicated chunks pinned in storage until that time has passed. If the reusedelay is 0, there is a still a small cushion window that might allow the chunks to still be linked to. If you waited for a couple of hours AFTER the deletion occurred, I would not expect those chunks to be reused. Del "ADSM: Dist Stor Manager"wrote on 04/10/2017 10:57:27 AM: > From: "Loon, Eric van (ITOPT3) - KLM" > To: ADSM-L@VM.MARIST.EDU > Date: 04/10/2017 11:01 AM > Subject: Deduplication > Sent by: "ADSM: Dist Stor Manager" > > Hi guys! > We are trying to make a fair comparison between server- and client- > side deduplication. I'm running into an 'issue' where I notice that > once you created a backup of a certain set of data, it is always > deduplicated 100% afterwards when you start a new client-side > deduped backup. Even when you delete all previous backup on the server first! > So I backed up a directory, retrieved all objectids through a select > * from backups and deleted all objects, but still a new backup is > deduplicated 100%. I don't understand why. I though it maybe had > something to do with data still being in the container pool, but > even with reusdelay=0, everything is deduplicated... > Thanks for any help (Andy? :)) in advance. > Kind regards, > Eric van Loon > Air France/KLM Storage Engineering > > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee > only. If you are not the addressee, you are notified that no part of > the e-mail or any attachment may be disclosed, copied or > distributed, and that any other action related to this e-mail or > attachment is strictly prohibited, and may be unlawful. If you have > received this e-mail by error, please notify the sender immediately > by return e-mail, and delete this message. > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/ > or its employees shall not be liable for the incorrect or incomplete > transmission of this e-mail or any attachments, nor responsible for > any delay in receipt. > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal > Dutch Airlines) is registered in Amstelveen, The Netherlands, with > registered number 33014286 > >
Re: Deduplication
Perhaps the client side dedupe is keeping a dedupe hash-bitmap that is not getting fully refreshed when you purge the backup data from the server? -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, Eric van (ITOPT3) - KLM Sent: Monday, April 10, 2017 10:57 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication Hi guys! We are trying to make a fair comparison between server- and client-side deduplication. I'm running into an 'issue' where I notice that once you created a backup of a certain set of data, it is always deduplicated 100% afterwards when you start a new client-side deduped backup. Even when you delete all previous backup on the server first! So I backed up a directory, retrieved all objectids through a select * from backups and deleted all objects, but still a new backup is deduplicated 100%. I don't understand why. I though it maybe had something to do with data still being in the container pool, but even with reusdelay=0, everything is deduplicated... Thanks for any help (Andy? :)) in advance. Kind regards, Eric van Loon Air France/KLM Storage Engineering For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286 -- Please see the following link for the BlueCross BlueShield of Tennessee E-mail disclaimer: http://www.bcbst.com/email_disclaimer.shtm
Re: Deduplication and database backups
If I understand correctly the large object size will place it in the highest dedup tier and that will make Spectrum Protect create less chunks of a larger size than if the object would be smaller. But even then I would still expect to see some dedup results, even if it would be just a few percent. Especially if you create a second full backup. On Fri, Apr 1, 2016 at 11:00 PM, Arni Snorri Eggertsson <ar...@gormur.com> wrote: > Hi Guys, > > Thanks for the feedback, My feeling is that it must be that the HANA api > does not split the objects into smaller chunks, I am actually seeing the > same issue when doing Sybase ACE backups, again large objects, but still > under 50GB > > I see good deduplication on MSSQL and Domino backups, in directory > container pools, > > Eric, HANA is SAP's own in-memory database, no oracle. > > I have client compression turned off, and even if database compression > would be turned on I would expect som deduplication, 0 is a pretty > definitive no dedup. > > *Arni Snorri Eggertsson* > ar...@gormur.com > > > On Fri, Apr 1, 2016 at 9:01 AM, Loon, EJ van (ITOPT3) - KLM < > eric-van.l...@klm.com> wrote: > > > Hi Arni! > > Just a thought: could it be that Oracle compression is turned on? > > Kind regards, > > Eric van Loon > > Air France/KLM Storage Engineering > > > > -Original Message- > > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > > Stefan Folkerts > > Sent: donderdag 31 maart 2016 17:55 > > To: ADSM-L@VM.MARIST.EDU > > Subject: Re: Deduplication and database backups > > > > I've seen plenty of databases go to container pools and get fair to good > > deduplications results even on the first backup. > > It should not matter that it is one large object, it will make the chunks > > larger but normally you should still get some deduplication as long as > it's > > not encrypted. > > It would seem like something strange that might just be HANA specific? > > > > On Wed, Mar 30, 2016 at 3:34 PM, Arni Snorri Eggertsson < > ar...@gormur.com> > > wrote: > > > > > Hi all, > > > > > > I want to hear what others are doing in regards of deduplication and > > > large files / database backups, > > > > > > on a recent setup we are taking backups of a SAP Hana system to a > > > direcotry container, I see great dedup stats when the system is doing > > > log backups, but I get no deduplication effects when we are doing full > > > backups, the database is roughly 250 GB in size, and it looks like > > > TSM sees the object as one file. > > > > > > ANR0951I Session 550996 for node x processed 1 files using inline > > > deduplication. 251,754,067,764 bytes were reduced by 0 bytes. > (SESSION: > > > 550996) > > > > > > > > > I am not 100% sure how to handle this, are others using > directory > > > containers at all? are you using them for TDP Database backups ? any > > > thoughts ? > > > > > > > > > > > > > > > *Arni Snorri Eggertsson* > > > ar...@gormur.com > > > > > > > For information, services and offers, please visit our web site: > > http://www.klm.com. This e-mail and any attachment may contain > > confidential and privileged material intended for the addressee only. If > > you are not the addressee, you are notified that no part of the e-mail or > > any attachment may be disclosed, copied or distributed, and that any > other > > action related to this e-mail or attachment is strictly prohibited, and > may > > be unlawful. If you have received this e-mail by error, please notify the > > sender immediately by return e-mail, and delete this message. > > > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its > > employees shall not be liable for the incorrect or incomplete > transmission > > of this e-mail or any attachments, nor responsible for any delay in > receipt. > > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch > > Airlines) is registered in Amstelveen, The Netherlands, with registered > > number 33014286 > > > > > > >
Re: Deduplication and database backups
Hi Guys, Thanks for the feedback, My feeling is that it must be that the HANA api does not split the objects into smaller chunks, I am actually seeing the same issue when doing Sybase ACE backups, again large objects, but still under 50GB I see good deduplication on MSSQL and Domino backups, in directory container pools, Eric, HANA is SAP's own in-memory database, no oracle. I have client compression turned off, and even if database compression would be turned on I would expect som deduplication, 0 is a pretty definitive no dedup. *Arni Snorri Eggertsson* ar...@gormur.com On Fri, Apr 1, 2016 at 9:01 AM, Loon, EJ van (ITOPT3) - KLM < eric-van.l...@klm.com> wrote: > Hi Arni! > Just a thought: could it be that Oracle compression is turned on? > Kind regards, > Eric van Loon > Air France/KLM Storage Engineering > > -Original Message- > From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of > Stefan Folkerts > Sent: donderdag 31 maart 2016 17:55 > To: ADSM-L@VM.MARIST.EDU > Subject: Re: Deduplication and database backups > > I've seen plenty of databases go to container pools and get fair to good > deduplications results even on the first backup. > It should not matter that it is one large object, it will make the chunks > larger but normally you should still get some deduplication as long as it's > not encrypted. > It would seem like something strange that might just be HANA specific? > > On Wed, Mar 30, 2016 at 3:34 PM, Arni Snorri Eggertsson <ar...@gormur.com> > wrote: > > > Hi all, > > > > I want to hear what others are doing in regards of deduplication and > > large files / database backups, > > > > on a recent setup we are taking backups of a SAP Hana system to a > > direcotry container, I see great dedup stats when the system is doing > > log backups, but I get no deduplication effects when we are doing full > > backups, the database is roughly 250 GB in size, and it looks like > > TSM sees the object as one file. > > > > ANR0951I Session 550996 for node x processed 1 files using inline > > deduplication. 251,754,067,764 bytes were reduced by 0 bytes. (SESSION: > > 550996) > > > > > > I am not 100% sure how to handle this, are others using directory > > containers at all? are you using them for TDP Database backups ? any > > thoughts ? > > > > > > > > > > *Arni Snorri Eggertsson* > > ar...@gormur.com > > > > For information, services and offers, please visit our web site: > http://www.klm.com. This e-mail and any attachment may contain > confidential and privileged material intended for the addressee only. If > you are not the addressee, you are notified that no part of the e-mail or > any attachment may be disclosed, copied or distributed, and that any other > action related to this e-mail or attachment is strictly prohibited, and may > be unlawful. If you have received this e-mail by error, please notify the > sender immediately by return e-mail, and delete this message. > > Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its > employees shall not be liable for the incorrect or incomplete transmission > of this e-mail or any attachments, nor responsible for any delay in receipt. > Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch > Airlines) is registered in Amstelveen, The Netherlands, with registered > number 33014286 > > >
Re: Deduplication and database backups
Hi Arni! Just a thought: could it be that Oracle compression is turned on? Kind regards, Eric van Loon Air France/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Stefan Folkerts Sent: donderdag 31 maart 2016 17:55 To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication and database backups I've seen plenty of databases go to container pools and get fair to good deduplications results even on the first backup. It should not matter that it is one large object, it will make the chunks larger but normally you should still get some deduplication as long as it's not encrypted. It would seem like something strange that might just be HANA specific? On Wed, Mar 30, 2016 at 3:34 PM, Arni Snorri Eggertsson <ar...@gormur.com> wrote: > Hi all, > > I want to hear what others are doing in regards of deduplication and > large files / database backups, > > on a recent setup we are taking backups of a SAP Hana system to a > direcotry container, I see great dedup stats when the system is doing > log backups, but I get no deduplication effects when we are doing full > backups, the database is roughly 250 GB in size, and it looks like > TSM sees the object as one file. > > ANR0951I Session 550996 for node x processed 1 files using inline > deduplication. 251,754,067,764 bytes were reduced by 0 bytes. (SESSION: > 550996) > > > I am not 100% sure how to handle this, are others using directory > containers at all? are you using them for TDP Database backups ? any > thoughts ? > > > > > *Arni Snorri Eggertsson* > ar...@gormur.com > For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication and database backups
I've seen plenty of databases go to container pools and get fair to good deduplications results even on the first backup. It should not matter that it is one large object, it will make the chunks larger but normally you should still get some deduplication as long as it's not encrypted. It would seem like something strange that might just be HANA specific? On Wed, Mar 30, 2016 at 3:34 PM, Arni Snorri Eggertssonwrote: > Hi all, > > I want to hear what others are doing in regards of deduplication and large > files / database backups, > > on a recent setup we are taking backups of a SAP Hana system to a direcotry > container, I see great dedup stats when the system is doing log backups, > but I get no deduplication effects when we are doing full backups, > the database is roughly 250 GB in size, and it looks like TSM sees the > object as one file. > > ANR0951I Session 550996 for node x processed 1 files using inline > deduplication. 251,754,067,764 bytes were reduced by 0 bytes. (SESSION: > 550996) > > > I am not 100% sure how to handle this, are others using directory > containers at all? are you using them for TDP Database backups ? any > thoughts ? > > > > > *Arni Snorri Eggertsson* > ar...@gormur.com >
Re: Deduplication questions, again
I'll third the odd percentages... using 7.1.3.100. tsm: TSMPRD02>select sum(reporting_mb) from OCCUPANCY where stgpool_name='SASCONT0' Unnamed[1] -- 182520798.90 tsm: TSMPRD02>q stg sascont0 Storage Device Storage Estimated Pct Pct Hig- Lo- Next Stora- Pool Name Class Name TypeCapacity Util Migr h M- w ge Pool ig Mi- Pct g Pct --- -- - -- - - --- --- SASCONT0 DIRECTORY 204,729 G 50.5 VTLPOOL02 Rounded numbers- Pool Capacity: 204TB Pool Used: 50% Pool Used: 102TB Reporting Capacity: 180TB Savings: 56% OC shows savings of 0%. Servergraph also shows a pool savings of 0%. That said, some of our TDP backups used to see 93% savings and now show 0% so something may be going on with our containers. --- David Nixon Storage Engineer II Technology Services Group Carilion Clinic 451 Kimball Ave. Roanoke, VA 24015 Phone: 540-224-3903 cdni...@carilionclinic.org Our mission: Improve the health of the communities we serve. Notice: The information and attachment(s) contained in this communication are intended for the addressee only, and may be confidential and/or legally privileged. If you have received this communication in error, please contact the sender immediately, and delete this communication from any computer or network system. Any interception, review, printing, copying, re-transmission, dissemination, or other use of, or taking of any action upon this information by persons or entities other than the intended recipient is strictly prohibited by law and may subject them to criminal or civil liability. Carilion Clinic shall not be liable for the improper and/or incomplete transmission of the information contained in this communication or for any delay in its receipt.
Re: Deduplication questions, again
Matthew, Just a question : how do you know the size of pre-dedup data ? Did you make use of backup reports on each clients to get that information, or built some query based on dedupstats table, or anything else ? Here again, I cannot seem to find coherent information in TSM output ... Let's take an example : q occ CH2RS901 / Node Name Type Filespace FSID Storage Number ofPhysical Logical NamePool NameFiles Space Space OccupiedOccupied (MB)(MB) -- -- -- --- --- --- CH2RS901 Bkup / 4 CONT_STG 8,148 - 165.78 So, from "q occ" output we have 165.78 MB logical space occupied But: q dedupstats CONT_STG CH2RS901 / f=d Date/Time: 03/22/16 16:03:30 Storage Pool Name: CONT_STG Node Name: CH2RS901 Filespace Name: / FSID: 4 Type: Bkup Total Data Protected (MB): 167 Total Space Used (MB): 36 Total Space Saved (MB): 131 Total Saving Percentage: 78.34 Deduplication Savings: 137,056,854 Deduplication Percentage: 78.34 Non-Deduplicated Extent Count: 8,161 Non-Deduplicated Extent Space Used: 7,903,461 Unique Extent Count: 6 Unique Extent Space Used: 100,858 Shared Extent Count: 3,176 Shared Extent Data Protected: 166,937,957 Shared Extent Space Used: 29,783,486 Compression Savings: 0 Compression Percentage: 0.00 Compressed Extent Count: 0 Uncompressed Extent count: 11,343 If I trust this output, I have backed up 167 MB, and TSM deduped it down to 36 MB ... Could anyone explain how it comes that TSM "q occ" finds a "logical space occupied" of 165.78 MB ? Shouldn't it be 36 MB ? The help of "q occ" command states : Logical Space Occupied (MB) The amount of space that is occupied by logical files in the file space. Logical space is the space that is actually used to store files, excluding empty space within aggregates. For this value, 1 MB = 1048576 bytes. I'm lost here ... Cheers. Arnaud From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Matthew McGeary Sent: Tuesday, March 22, 2016 2:23 PM To: ADSM-L@VM.MARIST.EDU<mailto:ADSM-L@VM.MARIST.EDU> Subject: Re: Deduplication questions, again Arnaud, I too am seeing odd percentages where containerpools and dedup is concerned. I have a small remote server pair that protects ~23 TB of pre dedup data, but my containerpools show an occupancy of ~10 TB, which should be a data reduction of over 50%. However, a q stg on the containerpool only shows a data reduction ratio of 21%. Of note, I use client-side dedup on all the client nodes at this particular site and I think that's mucking up the data reduction numbers on the containerpool. The 21% figure seems to be the reduction AFTER client-side dedup, not the total data reduction. It's confusing. On the plus side, I just put in the new 7.1.5 code at this site and the compression is working well and does not appear to add a noticeable amount CPU cycles during ingest. Since the install date on the 18th, I've backed up around 1 TB pre-dedup and the compression savings are rated at ~400 GB, which is pretty impressive. I'm going to do a test restore today and see how it performs but so far so good. __ Matthew McGeary Technical Specialist - Infrastructure PotashCorp T: (306) 933-8921 www.potashcorp.com<http://www.potashcorp.com> From:PAC Brion Arnaud <arnaud.br...@panalpina.com<mailto:arnaud.br...@panalpina.com>> To:ADSM-L@VM.MARIST.EDU<mailto:ADSM-L@VM.MARIST.EDU> Date:03/22/2016 03:52 AM Subject:[ADSM-L] Deduplication questions, again Sent by:"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU<mailto:ADSM-L@VM.MARIST.EDU>> Hi All, Another question in regards of TSM container based deduplicated pools ... Are you experiencing the same behavior than this : using "q stg f=d" targeting a deduped container based storage pool, I observe following output : q stg f=d Storage Pool Name: CONT_STG Storage Pool Type: Primary Device Class Name: Storage Type: DIRECTORY Cloud Type: Cloud URL: Cloud Identity: Cloud Location: Estima
Re: Deduplication questions, again
Arnaud, I too am seeing odd percentages where containerpools and dedup is concerned. I have a small remote server pair that protects ~23 TB of pre dedup data, but my containerpools show an occupancy of ~10 TB, which should be a data reduction of over 50%. However, a q stg on the containerpool only shows a data reduction ratio of 21%. Of note, I use client-side dedup on all the client nodes at this particular site and I think that's mucking up the data reduction numbers on the containerpool. The 21% figure seems to be the reduction AFTER client-side dedup, not the total data reduction. It's confusing. On the plus side, I just put in the new 7.1.5 code at this site and the compression is working well and does not appear to add a noticeable amount CPU cycles during ingest. Since the install date on the 18th, I've backed up around 1 TB pre-dedup and the compression savings are rated at ~400 GB, which is pretty impressive. I'm going to do a test restore today and see how it performs but so far so good. __ Matthew McGeary Technical Specialist - Infrastructure PotashCorp T: (306) 933-8921 www.potashcorp.com From: PAC Brion ArnaudTo: ADSM-L@VM.MARIST.EDU Date: 03/22/2016 03:52 AM Subject:[ADSM-L] Deduplication questions, again Sent by:"ADSM: Dist Stor Manager" Hi All, Another question in regards of TSM container based deduplicated pools ... Are you experiencing the same behavior than this : using "q stg f=d" targeting a deduped container based storage pool, I observe following output : q stg f=d Storage Pool Name: CONT_STG Storage Pool Type: Primary Device Class Name: Storage Type: DIRECTORY Cloud Type: Cloud URL: Cloud Identity: Cloud Location: Estimated Capacity: 5,087 G Space Trigger Util: Pct Util: 55.8 Pct Migr: Pct Logical: 100.0 High Mig Pct: Skipped few lines ... Compressed: No Deduplication Savings: 0 (0%) Compression Savings: 0 (0%) Total Space Saved: 0 (0%) Auto-copy Mode: Contains Data Deduplicated by Client?: Maximum Simultaneous Writers: No Limit Protection Storage Pool: CONT_STG Date of Last Protection: 03/22/16 05:00:27 Deduplicate Requires Backup?: Encrypted: Space Utilized(MB): Note the "deduplication savings" output ( 0 %) However, using "q dedupstats" on the same stgpool, I get following output : (just a snippet of it) Date/Time: 03/17/16 16:31:24 Storage Pool Name: CONT_STG Node Name: CH1RS901 Filespace Name: / FSID: 3 Type: Bkup Total Saving Percentage: 78.11 Total Data Protected (MB): 170 Date/Time: 03/17/16 16:31:24 Storage Pool Name: CONT_STG Node Name: CH1RS901 Filespace Name: /usr FSID: 4 Type: Bkup Total Saving Percentage: 62.25 Total Data Protected (MB): 2,260 How does it come that on one side I witness dedup, but not on the other one ? Thanks for enlightenments ! Cheers. Arnaud
Re: deduplication status
39 is actually not a great number; it means you are getting less than 2 for 1 dedup. Unless you have backups running hard 24 hours a day, those dedup processes should finish. When you do Q PROC, if the processes have any work to do, they show as ACTIVE, if not they show IDLE. I'd think that at some point during the day, you should have at least one of them go idle, then you know you have been as aggressive as you can be. If not I'd add processes until you can see some idle time on at least one of them. Just my 2cents. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Tyree, David Sent: Monday, July 28, 2014 11:42 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] deduplication status I've searched the archives but I can't really find the answer I'm looking for. Running on version 6.3.1.0. I have a primary storage pool running dedup. I run the ID DEDUP command, the reclamation command, and the expiration command throughout the daily cycle. I run the Q STGP F=D command to check the Duplicate Data Not Stored numbers and I'm getting 39% right now. Which sounds pretty good, I guess. My question is how to I tell if I'm running the dedup processes aggressively enough. Can I do something to increase that number? I realize that the dedup processes are never really finished because of the new data that is constantly coming in and old data is getting expired. Is there something I can look at to be able to tell if I need to adjust the ID DUP commands I'm running? More processes, less processes or change how long I run it... Something that tells me how much data has not been deduped yet versus how much has been processed. Is that kind of info accessible? David Tyree System Administrator South Georgia Medical Center 229.333.1155
Re: Deduplication number of chunks waiting in queue continues to rise?
Hey, Nick, missed your name the first time around! Being in higher-ed/research we went the cheap route and actually just use direct-attach 15K SAS drives on Dell servers, divvied up into multiple RAID-10 sets. Even a 1TB database only takes us ~1 hour to backup or restore, which is well within our SLA. On 12/20/2013 11:42 AM, Marouf, Nick wrote: Hi Skylar ! Yes that would be the easy way do it, there is an option to rebalance the I/O after you add the new file systems to the database. I had already setup TSM before the performance tuning guideline was released. Doing this way, will require more storage initially and running db2 rebalancing command line tools will spread out the DB I/O load Using IBM XIV's that can handle very large IO requests, in our specific case there was no need to provide physically-separate volumes. I've seen one TSM instance crank upwards of 10,000 IOPS leaving an entire ESX cluster in the dust. -- -- Skylar Thompson (skyl...@u.washington.edu) -- Genome Sciences Department, System Administrator -- Foege Building S046, (206)-685-7354
Re: Deduplication number of chunks waiting in queue continues to rise?
Hi Wanda, I'm using Deduplication and have found that tsm life would be much easier if the stg pool was kept smaller under 3TB in size. I haven't done enough testing with this, and I know it is slightly counterproductive to achieve the highest deduplication savings. But it sure does make the administrative side much cleaner and easier to work. Keeping the storage pool smaller does create offsite copies and reclamation faster. It is a divide and conquer. I do have a storage pool that does take longer to reclaim, and this seems to clear up every Sunday as we generally have very little incoming client backups on that day. I'm using client side deduplication to leverage the client processing; and In order to protect against stale or duplicate chunk collisions I keep the cache databases set very low where the clients on average have to reset that cache every few days. Deduplication is very important for me, Please keep me in the loop when you come closer to a resolution. tsm: TSMC04PSHOW DEDUPDELETEINFO Dedup Deletion General Status Number of worker threads : 8 Number of active worker threads : 1 Number of chunks waiting in queue : 1967534 tsm: TSMC04Pq db f=d Database Name: TSMDB1 Total Size of File System (MB): 1,148,760 Space Used by Database(MB): 304,105 Free Space Available (MB): 6,755,930 Total Pages: 33,625,117 Usable Pages: 33,623,517 Used Pages: 33,620,309 Free Pages: 3,208 Buffer Pool Hit Ratio: 98.0 Total Buffer Requests: 21,032,020,059 Sort Overflows: 0 Package Cache Hit Ratio: 99.8 Last Database Reorganization: 12/16/2013 18:21:53 Full Device Class Name: 3592DEV Number of Database Backup Streams: 1 Incrementals Since Last Full: 0 Last Complete Backup Date/Time: 12/19/2013 10:12:53 -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Erwann Simon Sent: Friday, December 20, 2013 12:33 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Hi Wanda, Expire Inventory is queuing chunk for deletion. See the Q PR output when, at the end of the expire inventory process, the total numbers of nodes have been reached. No more deletion of objects occurs, but SHOW DEDUPDELETEINFO shows that the deletion threads are still working, queuing and deleting chunks. This activity does not appear externally and consumes most of the expire inventory time. Let's try with deduplication disabled (dedup=no) for that pool (?). Regards, Erwann Prather, Wanda wanda.prat...@icfi.com a écrit : TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o) -- Erwann SIMON Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.
Re: Deduplication number of chunks waiting in queue continues to rise?
Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
Sergio and Wanda, Thanks for your posts! I opened PMR 10702,L6Q,000 a couple weeks ago for slow performance [recently completely fell off the cliff!] with our SRV3 TSM v6.3.4.200 service that *was* successfully doing client+server deduplication for 72TB BackupDedup STGpool on NetApp FC [soon to be 3par] FC disks. I did not previously know about this command... SHow DEDUPDelinfo now shows 7M enqueued dedupdel chunks @ SRV3 TSM. I just requested escalation to consider whether TSMv6.3.4.207 efix will help us. Thanks again... hoping to re-post with better performance results soon! jim.o...@yale.edu (w#203.432.6693, c#203.494.9201, h#203.387.3030) On 12/20/2013 10:38 AM, Sergio O. Fuentes wrote: Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
Please do post results - expiration just ran for me, queue 30M! 45 TB dedup pool -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of James R Owen Sent: Friday, December 20, 2013 11:19 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Sergio and Wanda, Thanks for your posts! I opened PMR 10702,L6Q,000 a couple weeks ago for slow performance [recently completely fell off the cliff!] with our SRV3 TSM v6.3.4.200 service that *was* successfully doing client+server deduplication for 72TB BackupDedup STGpool on NetApp FC [soon to be 3par] FC disks. I did not previously know about this command... SHow DEDUPDelinfo now shows 7M enqueued dedupdel chunks @ SRV3 TSM. I just requested escalation to consider whether TSMv6.3.4.207 efix will help us. Thanks again... hoping to re-post with better performance results soon! jim.o...@yale.edu (w#203.432.6693, c#203.494.9201, h#203.387.3030) On 12/20/2013 10:38 AM, Sergio O. Fuentes wrote: Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
Client-side dedup and simultaneous-write to a copy pool are mutually exclusive. You can't do both, which is the only theoretical way to enforce deduprequiresbackup with client-side dedup. I suppose IBM could enhance TSM to do a simultaneous-like operation with client-side dedup, but that's not available now. So, I'm not sure how the TSM server enforces deduprequiresbackup with client-side dedup. Ever since 6.1 I have always set that to NO anyway. I have dealt with the repercussions of that as well. Backup stgpool on dedup'd stgpools is not pretty. I have made some architectural changes to the underlying stgpools and the 'backup stgpools' run pretty well, even with 1TB SATA drives. Two things I think helped quite a bit: 1. Use big predefined volumes. My new volumes are 50GB. 2. Use many filesystems for the devclass. I have 5 currently. I would use more if I had the space. Thanks! Sergio On 12/20/13 11:35 AM, Prather, Wanda wanda.prat...@icfi.com wrote: Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
I can second that Sergio, Backup stgpools to copy tapes is not pretty, and is an intensive process to rehydrate all that data. The one extra thing I did was split the database across multiple folder for parallel I/O to the Database. That has worked out very well, and I currently have it setup to span across 8 folders, with an XIV backend that can take a beating. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Friday, December 20, 2013 12:04 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Client-side dedup and simultaneous-write to a copy pool are mutually exclusive. You can't do both, which is the only theoretical way to enforce deduprequiresbackup with client-side dedup. I suppose IBM could enhance TSM to do a simultaneous-like operation with client-side dedup, but that's not available now. So, I'm not sure how the TSM server enforces deduprequiresbackup with client-side dedup. Ever since 6.1 I have always set that to NO anyway. I have dealt with the repercussions of that as well. Backup stgpool on dedup'd stgpools is not pretty. I have made some architectural changes to the underlying stgpools and the 'backup stgpools' run pretty well, even with 1TB SATA drives. Two things I think helped quite a bit: 1. Use big predefined volumes. My new volumes are 50GB. 2. Use many filesystems for the devclass. I have 5 currently. I would use more if I had the space. Thanks! Sergio On 12/20/13 11:35 AM, Prather, Wanda wanda.prat...@icfi.com wrote: Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only
Re: Deduplication number of chunks waiting in queue continues to rise?
While we don't do deduplication (tests show we gain less than 25% from it), we also split our DB2 instances across multiple, physically-separate volumes. The one thing to note is that you have to dump and restore the database to spread existing data across those directories if you add them post-installation. On Fri, Dec 20, 2013 at 02:23:34PM -0500, Marouf, Nick wrote: I can second that Sergio, Backup stgpools to copy tapes is not pretty, and is an intensive process to rehydrate all that data. The one extra thing I did was split the database across multiple folder for parallel I/O to the Database. That has worked out very well, and I currently have it setup to span across 8 folders, with an XIV backend that can take a beating. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Friday, December 20, 2013 12:04 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Client-side dedup and simultaneous-write to a copy pool are mutually exclusive. You can't do both, which is the only theoretical way to enforce deduprequiresbackup with client-side dedup. I suppose IBM could enhance TSM to do a simultaneous-like operation with client-side dedup, but that's not available now. So, I'm not sure how the TSM server enforces deduprequiresbackup with client-side dedup. Ever since 6.1 I have always set that to NO anyway. I have dealt with the repercussions of that as well. Backup stgpool on dedup'd stgpools is not pretty. I have made some architectural changes to the underlying stgpools and the 'backup stgpools' run pretty well, even with 1TB SATA drives. Two things I think helped quite a bit: 1. Use big predefined volumes. My new volumes are 50GB. 2. Use many filesystems for the devclass. I have 5 currently. I would use more if I had the space. Thanks! Sergio On 12/20/13 11:35 AM, Prather, Wanda wanda.prat...@icfi.com wrote: Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number
Re: Deduplication number of chunks waiting in queue continues to rise?
Hi All, Is someone using this script for reporting purpose ? http://www-01.ibm.com/support/docview.wss?uid=swg21596944 -- Best regards / Cordialement / مع تحياتي Erwann SIMON - Mail original - De: Wanda Prather wanda.prat...@icfi.com À: ADSM-L@VM.MARIST.EDU Envoyé: Vendredi 20 Décembre 2013 05:35:38 Objet: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o)
Re: Deduplication number of chunks waiting in queue continues to rise?
Is anyone doing stgpool backups to a dedup file copy pool? At 02:23 PM 12/20/2013, Marouf, Nick wrote: I can second that Sergio, Backup stgpools to copy tapes is not pretty, and is an intensive process to rehydrate all that data. The one extra thing I did was split the database across multiple folder for parallel I/O to the Database. That has worked out very well, and I currently have it setup to span across 8 folders, with an XIV backend that can take a beating. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Friday, December 20, 2013 12:04 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Client-side dedup and simultaneous-write to a copy pool are mutually exclusive. You can't do both, which is the only theoretical way to enforce deduprequiresbackup with client-side dedup. I suppose IBM could enhance TSM to do a simultaneous-like operation with client-side dedup, but that's not available now. So, I'm not sure how the TSM server enforces deduprequiresbackup with client-side dedup. Ever since 6.1 I have always set that to NO anyway. I have dealt with the repercussions of that as well. Backup stgpool on dedup'd stgpools is not pretty. I have made some architectural changes to the underlying stgpools and the 'backup stgpools' run pretty well, even with 1TB SATA drives. Two things I think helped quite a bit: 1. Use big predefined volumes. My new volumes are 50GB. 2. Use many filesystems for the devclass. I have 5 currently. I would use more if I had the space. Thanks! Sergio On 12/20/13 11:35 AM, Prather, Wanda wanda.prat...@icfi.com wrote: Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me before the fix was that I had to delete a bunch of old TDP MSSQL filespaces and it just took forever for TSM to catch up. I have a few deletes to do now, and I'm a bit wary because I don't want to hose my server again. I would escalate with IBM support and have them supply you the e-fix. 6.3.4.3 I don't think is slated for release any time within the next few days, and you'll just be struggling to deal with the performance issue. HTH, Sergio On 12/19/13 11:35 PM, Prather, Wanda wanda.prat...@icfi.com wrote: TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to
Re: Deduplication number of chunks waiting in queue continues to rise?
Hi Skylar ! Yes that would be the easy way do it, there is an option to rebalance the I/O after you add the new file systems to the database. I had already setup TSM before the performance tuning guideline was released. Doing this way, will require more storage initially and running db2 rebalancing command line tools will spread out the DB I/O load Using IBM XIV's that can handle very large IO requests, in our specific case there was no need to provide physically-separate volumes. I've seen one TSM instance crank upwards of 10,000 IOPS leaving an entire ESX cluster in the dust. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Skylar Thompson Sent: Friday, December 20, 2013 2:28 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? While we don't do deduplication (tests show we gain less than 25% from it), we also split our DB2 instances across multiple, physically-separate volumes. The one thing to note is that you have to dump and restore the database to spread existing data across those directories if you add them post-installation. On Fri, Dec 20, 2013 at 02:23:34PM -0500, Marouf, Nick wrote: I can second that Sergio, Backup stgpools to copy tapes is not pretty, and is an intensive process to rehydrate all that data. The one extra thing I did was split the database across multiple folder for parallel I/O to the Database. That has worked out very well, and I currently have it setup to span across 8 folders, with an XIV backend that can take a beating. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Friday, December 20, 2013 12:04 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Client-side dedup and simultaneous-write to a copy pool are mutually exclusive. You can't do both, which is the only theoretical way to enforce deduprequiresbackup with client-side dedup. I suppose IBM could enhance TSM to do a simultaneous-like operation with client-side dedup, but that's not available now. So, I'm not sure how the TSM server enforces deduprequiresbackup with client-side dedup. Ever since 6.1 I have always set that to NO anyway. I have dealt with the repercussions of that as well. Backup stgpool on dedup'd stgpools is not pretty. I have made some architectural changes to the underlying stgpools and the 'backup stgpools' run pretty well, even with 1TB SATA drives. Two things I think helped quite a bit: 1. Use big predefined volumes. My new volumes are 50GB. 2. Use many filesystems for the devclass. I have 5 currently. I would use more if I had the space. Thanks! Sergio On 12/20/13 11:35 AM, Prather, Wanda wanda.prat...@icfi.com wrote: Woo hoo! That's great news. Will open a ticket and escalate. Also looking at client-side dedup, but I have to do some architectural planning, as all the data is coming from one client, the TSM VE data mover, which is a vm. Re client-side dedup, do you know if there is any cooperation between the client-side dedup and deduprequiresbackup on the server end? I have assumed that the client-side dedup would not offer that protection. W -Original Message- From: Sergio O. Fuentes [mailto:sfuen...@umd.edu] Sent: Friday, December 20, 2013 10:39 AM To: ADSM: Dist Stor Manager Cc: Prather, Wanda Subject: Re: [ADSM-L] Deduplication number of chunks waiting in queue continues to rise? Wanda, In trying to troubleshoot an unrelated performance PMR, IBM provided me with an e-fix for the dedupdel bottleneck that it sounds like you're experiencing. They obviously will want to do their due-diligence on whether or not this efix will help solve your problems, but it has proved very useful in my environment. They even had to compile a solaris e-fix for me, cause it seems like I'm the only one running TSM on Solaris. The e-fix was very simple to install. What you don't want to do is go to 6.3.4.2, unless they tell you to because the e-fix is for that level (207). Don't run on 6.3.4.2 for even a minute. Only install it to get to the e-fix level. Dedupdel gets populated by anything that deletes data from the stgpool, I.e. move data, expire inv, delete filespace, move nodedata, etc. We run client-side dedupe (which works pretty well, except when you run into performance issues on the server) and so our identifies don't run very long, if at all. It might save you time to run client-side dedupe. BTW, when I finally got this efix and TSM was able to catch-up with the deletes and reclaims as it needed to, I got some serious space space back in my TDP Dedup pool. It went from 90% util to 60% util (with about 10TB of total capacity). What finally really got me
Re: Deduplication number of chunks waiting in queue continues to rise?
Hi Wanda, some quick rambling thoughts about dereferenced chunk cleanup. Do you know about the 'show banner' command? If IBM sends you an e-fix, this will tell you what it is fixing. tsm: xshow banner * EFIX Cumulative level 6.3.4.207 * * This is a Limited Availability TEMPORARY fix for * * IC94121 - ANR2033E DEFINE ASSOCIATION: Command failed - lock con * * when def assoc immediately follows def sched. * * IC95890 - Allow numeric volser for zOS Media server volumes. * * IC93279 - Redrive failed outbound replication connect requests. * * IC93850 - PAM authentication login protocol exchange failure * * wi3187 - AUDIT LIBVOLUME new command* * IC96637 - SERVER CAN HANG WHEN USING OPERATION CENTER* * IC95938 - ANRD_2644193874 BFCHECKENDTOEND DURING RESTORE/RET * * IC96993 - MOVE NODEDATA OPERATION MIGHT RESULT IN INVALID LINKS * * IC91138 - Enable audit volume to mark one more kind invalid link * * THE RESTARTED RESTORE OPERATION MAY BE SINGLE-THREADED * * Avoid restore stgpool linking to orphaned base chunks * * WI3236 - Oracle T1D tape drive support * * 94297 - Add a parameter DELETEALIASES for DELETE BITFILE utili * * IC96462 - Mount failure retry for zOS Media server tape volumes. * * IC96993 - SLOW DELETION OF DEREFERENCED DEDUPLICATED CHUNKS * * This cumulative efix server is based on code level * * made generally available with FixPack 6.3.4.200 * * * I have 2 servers on 6342.006 and 2 on 6342.007. I have .009 efix waiting to be installed on my biggest, oldest, badest server to fix the chunks in queue problem. On 3 servers, the queue is down to 0, and they usually run without a problem. On the big bad one, here are the stats - tsm: WIN1show dedupdeleteinfo Dedup Deletion General Status Number of worker threads : 15 Number of active worker threads : 1 Number of chunks waiting in queue : 11326513 Dedup Deletion Worker Info Dedup deletion worker id: 1 Total chunks queued : 0 Total chunks deleted: 0 Deleting AF Entries?: Yes In error state? : No Worker thread 2 is not active Worker thread 3 is not active Worker thread 4 is not active Worker thread 5 is not active Worker thread 6 is not active Worker thread 7 is not active Worker thread 8 is not active Worker thread 9 is not active Worker thread 10 is not active Worker thread 11 is not active Worker thread 12 is not active Worker thread 13 is not active Worker thread 14 is not active Worker thread 15 is not active -- Total worker chunks queued : 0 Total worker chunks deleted: 0 The cleanup of reclaimed volumes is done by the thread which has ' Deleting AF Entries?: Yes'. The pending efix is supposed to get this process to finish. It never finishes on this server, something about a bad access plan. When I have a lot of volumes which are empty but won't delete, I generate move data commands for them. Move data to the same pool will manually do what the chunk cleanup process is trying to do. Regards, Bill Colwell Draper lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Thursday, December 19, 2013 11:36 PM To: ADSM-L@VM.MARIST.EDU Subject: Deduplication number of chunks waiting in queue continues to rise? TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever
Re: Deduplication number of chunks waiting in queue continues to rise?
Hi Wanda, Expire Inventory is queuing chunk for deletion. See the Q PR output when, at the end of the expire inventory process, the total numbers of nodes have been reached. No more deletion of objects occurs, but SHOW DEDUPDELETEINFO shows that the deletion threads are still working, queuing and deleting chunks. This activity does not appear externally and consumes most of the expire inventory time. Let's try with deduplication disabled (dedup=no) for that pool (?). Regards, Erwann Prather, Wanda wanda.prat...@icfi.com a écrit : TSM 6.3.4.00 on Win2K8 Perhaps some of you that have dealt with the dedup chunking problem can enlighten me. TSM/VE backs up to a dedup file pool, about 4 TB of changed blocks per day I currently have more than 2 TB (yep, terabytes) of volumes in that file pool that will not reclaim. We were told by support that when you do: SHOW DEDUPDELETEINFO That the number of chunks waiting in queue has to go to zero for those volumes to reclaim. (I know that there is a fix at 6.3.4.200 to improve the chunking process, but that has been APARed, and waiting on 6.3.4.300.) I have shut down IDENTIFY DUPLICATES and reclamation for this pool. There are no clients writing into the pool, we have redirected backups to a non-dedup pool for now to try and get this cleared up. There is no client-side dedup here, only server side. I've also set deduprequiresbackup to NO for now, although I hate doing that, to make sure that doesn't' interfere with the reclaim process. But SHOW DEDUPDELETEINFO shows that the number of chunks waiting in queue is *still* increasing. So, WHAT is putting stuff on that dedup delete queue? And how do I ever gain ground? W **Please note new office phone: Wanda Prather | Senior Technical Specialist | wanda.prat...@icfi.com | www.icfi.com ICF International | 443-718-4900 (o) -- Erwann SIMON Envoyé de mon téléphone Android avec K-9 Mail. Excusez la brièveté.
Re: Deduplication/replication options
No, this is correct, IBM does give Protectier (for example) customers an advantage with deduplication and factor in the dedup for billing. On Wed, Jul 24, 2013 at 10:18 PM, Colwell, William F. bcolw...@draper.comwrote: Hi Norman, that is incorrect. IBM doesn't care what the hardware is when measuring used capacity in the Suite for Unified Recovery licensing model. A description of the measurement process and the sql to do it is at http://www-01.ibm.com/support/docview.wss?uid=swg21500482 Thanks, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Gee, Norman Sent: Wednesday, July 24, 2013 11:29 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication/replication options
Hello Stefan Have you got cases of this? I ask because I have been specifically told by our rep that any dedupe saving for capacity licensing is TSM dedupe only, regarless of the backend storage. On 26 July 2013 09:16, Stefan Folkerts stefan.folke...@gmail.com wrote: No, this is correct, IBM does give Protectier (for example) customers an advantage with deduplication and factor in the dedup for billing. On Wed, Jul 24, 2013 at 10:18 PM, Colwell, William F. bcolw...@draper.comwrote: Hi Norman, that is incorrect. IBM doesn't care what the hardware is when measuring used capacity in the Suite for Unified Recovery licensing model. A description of the measurement process and the sql to do it is at http://www-01.ibm.com/support/docview.wss?uid=swg21500482 Thanks, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Gee, Norman Sent: Wednesday, July 24, 2013 11:29 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication/replication options
On Jul 26, 2013, at 5:21 AM, Steven Langdale steven.langd...@gmail.com wrote: Hello Stefan Have you got cases of this? I ask because I have been specifically told by our rep that any dedupe saving for capacity licensing is TSM dedupe only, regarless of the backend storage. During our last TSM license renewal negotiations, we were also undergoing a refresh of our storage for TSM. We heard the same thing Stefan heard: use-based pricing could be reduced by using TSM deduplication or using deduplication in ProtecTier, but there would be no reduction for deduplication effects in non-IBM hardware, such as the DataDomains we use. On the other hand, IBM often reminds us that terms and conditions may vary from region to region and probably from customer to customer, so it's entirely possible that all of us are telling the truth as it applies to us and that the only answer that works for everyone is, See the company you license TSM from or through. However, that conversation may go better for you if you know what others live with. Nick
Re: Deduplication/replication options
Yes I do but I can not share the names with people outside of my company, sorry. I'll tell you it's a mid sized company with two Protectiers in two locations that replicate, the customer has the entry level TB license model and IBM used the protectier interface to determin the dedup savings for the license if I am not mistaken. It could very well be what Nick wrote but I would tell your IBM rep that you have read about several cases and request them to ask again. On Fri, Jul 26, 2013 at 12:21 PM, Steven Langdale steven.langd...@gmail.com wrote: Hello Stefan Have you got cases of this? I ask because I have been specifically told by our rep that any dedupe saving for capacity licensing is TSM dedupe only, regarless of the backend storage. On 26 July 2013 09:16, Stefan Folkerts stefan.folke...@gmail.com wrote: No, this is correct, IBM does give Protectier (for example) customers an advantage with deduplication and factor in the dedup for billing. On Wed, Jul 24, 2013 at 10:18 PM, Colwell, William F. bcolw...@draper.comwrote: Hi Norman, that is incorrect. IBM doesn't care what the hardware is when measuring used capacity in the Suite for Unified Recovery licensing model. A description of the measurement process and the sql to do it is at http://www-01.ibm.com/support/docview.wss?uid=swg21500482 Thanks, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Gee, Norman Sent: Wednesday, July 24, 2013 11:29 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV
Re: Deduplication/replication options
I'm not sure, but I suspect that IBM is more willing to use better special-bid pricing to a customer who is using both TSM and ProtecTier. I.e., it's a matter of negotiation, not rigid rules about how to measure occupancy. I don't think there is anything in the occupancy calculation algorithm to compensate for ProtecTier deduplication. Perhaps an IBMer or Business Partner who is monitoring this list could chime in on this? At 06:21 AM 7/26/2013, Steven Langdale wrote: Hello Stefan Have you got cases of this? I ask because I have been specifically told by our rep that any dedupe saving for capacity licensing is TSM dedupe only, regarless of the backend storage. On 26 July 2013 09:16, Stefan Folkerts stefan.folke...@gmail.com wrote: No, this is correct, IBM does give Protectier (for example) customers an advantage with deduplication and factor in the dedup for billing. On Wed, Jul 24, 2013 at 10:18 PM, Colwell, William F. bcolw...@draper.comwrote: Hi Norman, that is incorrect. IBM doesn't care what the hardware is when measuring used capacity in the Suite for Unified Recovery licensing model. A description of the measurement process and the sql to do it is at http://www-01.ibm.com/support/docview.wss?uid=swg21500482 Thanks, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Gee, Norman Sent: Wednesday, July 24, 2013 11:29 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay
Re: Deduplication/replication options
Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication/replication options
On 07/23/2013 06:30 PM, Nick Laflamme wrote: I'm surprised by Allen's comments, given the context of the list. TSM doesn't support BOOST. It doesn't support at the server level, and it doesn't support for a client writing directly to a DataDomain DDR. Duh, yes, good point. Context: We moved our Oracle backups off of TSM entirely. On Tue, Jul 23, 2013 at 3:12 PM, Allen S. Rout a...@ufl.edu wrote: We're not using boost; our primary use for the DD is for Oracle backups, and our DBAs are far more interested in the conventional filesystem user interface than they are in the network savings. So, in order to use the normal filesystem interface of the DD, they of course are not using TSM as an interlocutor. - Allen S. Rout
Re: Deduplication/replication options
This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication/replication options
Hi Norman, that is incorrect. IBM doesn't care what the hardware is when measuring used capacity in the Suite for Unified Recovery licensing model. A description of the measurement process and the sql to do it is at http://www-01.ibm.com/support/docview.wss?uid=swg21500482 Thanks, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Gee, Norman Sent: Wednesday, July 24, 2013 11:29 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options This why IBM is pushing their VTL solution. IBM will only charge for the net amount using an all IBM solution. At least that is what I was told. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Loon, EJ van - SPLXM Sent: Tuesday, July 23, 2013 11:59 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication/replication options Hi Sergio! Another thing to take into consideration: if you have switched from PVU licensing to sub-capacity licensing in the past: TSM sub-capacity licensing is based on the amount of data stored in your primary pool. If this data is stored on a de-duplicating storage device you will be charged for the gross amount of data. If you are using TSM de-duplication you will have to pay for the de-duplicated amount. This will probably save you a lot of money... Kind regards, Eric van Loon AF/KLM Storage Engineering -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: dinsdag 23 juli 2013 19:20 To: ADSM-L@VM.MARIST.EDU Subject: Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio For information, services and offers, please visit our web site: http://www.klm.com. This e-mail and any attachment may contain confidential and privileged material intended for the addressee only. If you are not the addressee, you are notified that no part of the e-mail or any attachment may be disclosed, copied or distributed, and that any other action related to this e-mail or attachment is strictly prohibited, and may be unlawful. If you have received this e-mail by error, please notify the sender immediately by return e-mail, and delete this message. Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its employees shall not be liable for the incorrect or incomplete transmission of this e-mail or any attachments, nor responsible for any delay in receipt. Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch Airlines) is registered in Amstelveen, The Netherlands, with registered number 33014286
Re: Deduplication/replication options
Hi Sergio, There are many people more knowledgeable than I am on this topic, and I hope they contribute to this interesting question. My two cents would be to remember that the TSM database doesn't know about an array replication, so you'll have to deal with that issue if you have a massive recovery event. My other two cents are that to be certain sure that whichever solution you choose works as described. You can't test the alternatives enough. Best wishes, Keith Arbogast Indiana University
Re: Deduplication/replication options
On 07/23/2013 01:19 PM, Sergio O. Fuentes wrote: We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, Not so. There's a driver-ish package from EMC, associated with the Data Domain product line, called boost. Boost shoves dedupe work from the central device out to the client box, distributing CPU work and saving network traffic. There may be other similar offerings, but Data Domain is what we've got, so it's what I know. We're not using boost; our primary use for the DD is for Oracle backups, and our DBAs are far more interested in the conventional filesystem user interface than they are in the network savings. But if you find the bandwidth between client and device to be a serious bottleneck, there's an option. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I intend to go the same direction you are intending to go. But I'm not there yet. I hope to have some results on this before September. Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Only thoughts, not tales yet. But I'm planning to experiment with dedupe both at the TSM level and at the storage array level. I've heard several rumors that the Data Domain can dedupe even deduped e.g. VEEAM backups, with very good ratios. I'm going to try a similar theory with the DD and TSM-deduped stgpools. - Allen S. Rout
Re: Deduplication/replication options
I'm using Data Domain as the only dedup component. Mgmt is balking at the cost additional disk or tape pools with TSM dedup and the highly desired backup to non-dedup pool. Our current tape technology is quite old and replacing with several new drives and library hardware isn't on the financial agenda. We have Data Domain in two data centers. A TSM pool on the DD is replicated to the alternate DD via DD replication. It replicates the de-duped data, so latency/bandwidth is less of an issue. An second TSM server will see the other DD as a Pool, or at least that's the plan. Haven't fully tested yet. Had to carefully define the Device Class to make sure the path name is identical on both ends. Will have to stop DD replication, at least temporarily, to test it. But, haven't tested yet. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Tuesday, July 23, 2013 12:20 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio [Confidentiality notice:] *** This e-mail message, including attachments, if any, is intended for the person or entity to which it is addressed and may contain confidential or privileged information. Any unauthorized review, use, or disclosure is prohibited. If you are not the intended recipient, please contact the sender and destroy the original message, including all copies, Thank you. ***
Re: Deduplication/replication options
I have no experience with TSM de-dup, but I have plenty with Data Domain. We have 3 different disaster recovery methods for 3 sites. 1. The largest site is traditional TSM, write the data to a primary pool (DD VTL) and make copies to physical tape and use a truck to move them away. 2. Medium site has a secondary DC 60 miles away and plenty of bandwidth. That site does DD to DD replication and disk array replication of the TSM disks. This results in a DR time of about 1 hour to start up the DR location. No copy tape. 3. A small site with DD to DD replication and a cold TSM instance at the DR site that will restore from the replicated DD. No copy tape. #1 is the safest because it uses copy tape. #2 #3 get the data to safety the fastest. My preference is DD to DD replication and to make physical copy tapes of the production stuff. We have had corrupt tapes in the DD dues to the DD crashing. This is rare, but does happen. Both are valid depending on the amount of money you have to spend. We did not look at TSM de-dup because our servers are out of cycles already. Be sure to test any DR plans if you are using DD VTL. There are some interesting problems with serial number changes. Andy Huebner -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Sergio O. Fuentes Sent: Tuesday, July 23, 2013 12:20 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication/replication options Hello all, We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, but it would be possible if we replicated to an identical array (but TSM replication would be bandwidth intensive). TSM dedupe might not scale as well and may neccessitate more TSM servers to distribute the load. Overall, though, I think the cost of additional servers is way less than what a native dedupe array would cost so I don't think that's a big hit. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I'm going on and on here, but has anybody had to make a decision to go one way or the other? Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Thanks! Sergio
Re: Deduplication/replication options
I'm surprised by Allen's comments, given the context of the list. TSM doesn't support BOOST. It doesn't support at the server level, and it doesn't support for a client writing directly to a DataDomain DDR. This may be obvious to everyone, but I fear for the people who are TSM-centric and haven't gotten to the point of bypassing TSM in some instances. Also, BOOST is feature with a price, just like the VTL support is and replication is. I'm not saying that's bad, only that you have to factor that in. As for us, we don't use copy storage pools with our DataDomains; we make sure our TSM servers use replicated disk and write to replicated storage, and if we ever lose our primary data center, we'll restore at our DR site pick up with the replicated DDR storage. As others have noted, this leaves us vulnerable to any corruption due to DDR crashes or server crashes that confuse the DDR, but management signed off on that. We briefly experimented with running copy pools on the same DDR to have diversity in how data was arranged, but the growth in the size of our TSM databases and a surprisingly poor dedupe rate for a second copy on the same DDR doomed that initiative. Nick On Tue, Jul 23, 2013 at 3:12 PM, Allen S. Rout a...@ufl.edu wrote: On 07/23/2013 01:19 PM, Sergio O. Fuentes wrote: We're currently faced with a decision go with a dedupe storage array or with TSM dedupe for our backup storage targets. There are some very critical pros and cons going with one or the other. For example, TSM dedupe will reduce overall network throughput both for backups and replication (source-side dedupe would be used). A dedupe storage array won't do that for backup, Not so. There's a driver-ish package from EMC, associated with the Data Domain product line, called boost. Boost shoves dedupe work from the central device out to the client box, distributing CPU work and saving network traffic. There may be other similar offerings, but Data Domain is what we've got, so it's what I know. We're not using boost; our primary use for the DD is for Oracle backups, and our DBAs are far more interested in the conventional filesystem user interface than they are in the network savings. But if you find the bandwidth between client and device to be a serious bottleneck, there's an option. Replication is key. We have two datacenters where I would love it if TSM replication could be used in order to quickly (still manually, though) activate the replication server for production if necessary. Having a dedupe storage array kind of removes that option, unless we want to replicate the whole rehydrated backup data via TSM. I intend to go the same direction you are intending to go. But I'm not there yet. I hope to have some results on this before September. Would it make sense to do a hybrid deployment (combination of TSM Dedupe and Array dedupe)? Any thoughts or tales of woes and forewarnings are appreciated. Only thoughts, not tales yet. But I'm planning to experiment with dedupe both at the TSM level and at the storage array level. I've heard several rumors that the Data Domain can dedupe even deduped e.g. VEEAM backups, with very good ratios. I'm going to try a similar theory with the DD and TSM-deduped stgpools. - Allen S. Rout
Re: Deduplication/replication options
Thanks, guys, for your input. Nick, your comment is relevant to us. We're not used to by-passing TSM for any storage management task regarding backups. We use very little storage-based replication in our environment as it is, and introducing array-based replication adds a wrinkle to managing our backup retention and storage policies. Most of our application managers have decided to use application-based replication or clustering (dataguard for Oracle, Always-on for MSSQL, DAGS for Exchange, etc.). Stands to reason that we would try and use TSM replication for backups. I do like the idea of trying to squeeze out more disk space by compressing or deduplicating TSM deduplication extents. FYI, we had our business partner try this with compressing TSM deduplication extents on a V7000 array. According to them, IBM has tried this as well. The result is not sufficient to cover the cost of the V7000 compression license. (20% compression on average of TSM deduplication extents). So, IBM has said it's not a recommended practice for the V7000 array. I even had a DD POC sitting on the floor for some time and I didn't think to try it on TSM dedupe extents. Out of curiosity, has anyone experimented with ZFS compression on a TSM storage pool? Might be a low-cost option, but I'm not sure how scalable or stable it is. The options are numerous, our man-power is limited. Thanks for your help! SF On 7/23/13 6:30 PM, Nick Laflamme n...@laflamme.us wrote: I'm surprised by Allen's comments, given the context of the list. TSM doesn't support BOOST. It doesn't support at the server level, and it doesn't support for a client writing directly to a DataDomain DDR. This may be obvious to everyone, but I fear for the people who are TSM-centric and haven't gotten to the point of bypassing TSM in some instances. Also, BOOST is feature with a price, just like the VTL support is and replication is. I'm not saying that's bad, only that you have to factor that in. As for us, we don't use copy storage pools with our DataDomains; we make sure our TSM servers use replicated disk and write to replicated storage, and if we ever lose our primary data center, we'll restore at our DR site pick up with the replicated DDR storage. As others have noted, this leaves us vulnerable to any corruption due to DDR crashes or server crashes that confuse the DDR, but management signed off on that. We briefly experimented with running copy pools on the same DDR to have diversity in how data was arranged, but the growth in the size of our TSM databases and a surprisingly poor dedupe rate for a second copy on the same DDR doomed that initiative. Nick
Re: Deduplication candidates
Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +-- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +--
Re: Deduplication candidates
Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +-- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +--
Re: Deduplication candidates
I second Wanda on the logs. When you think about it, logs are unique data, being entirely made of transactions in the order in which they come in. If they were identical to some other data, I'd start looking around for Twighlight Zone cameras. On the other hand, I suppose I could imagine a test harness issuing the exact same set of transactions to a test system multiple times. On 1/11/2013 7:21 AM, Prather, Wanda wrote: Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +-- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +--
Re: Deduplication candidates
Also the Files Per Set parameter in Oracle will really get you - Protectier Recommends no more than a setting of 4. We have seen 10 and we went from 10:1 to 2.5:1 -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Friday, January 11, 2013 9:22 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +-- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately.
Re: Deduplication candidates
Thanks Wanda and Alex, Yes I too thought about the uniqueness of the data that makes up logs. I guess I'm just second guessing myself. One approach I am thing about in regard to the same issue with pre Exchange 2010 log files (legacy incrementals) is if it wouldn't be better to just do full backups. Aside from the time-to-completion, overall storage requirements may be the same and would in most cases speed recovery. ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Alex Paschal Sent: Friday, January 11, 2013 11:57 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates I second Wanda on the logs. When you think about it, logs are unique data, being entirely made of transactions in the order in which they come in. If they were identical to some other data, I'd start looking around for Twighlight Zone cameras. On the other hand, I suppose I could imagine a test harness issuing the exact same set of transactions to a test system multiple times. On 1/11/2013 7:21 AM, Prather, Wanda wrote: Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +- +- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +- +-
Re: Deduplication candidates
Interesting idea -- Let us know what you find out! -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 2:03 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Thanks Wanda and Alex, Yes I too thought about the uniqueness of the data that makes up logs. I guess I'm just second guessing myself. One approach I am thing about in regard to the same issue with pre Exchange 2010 log files (legacy incrementals) is if it wouldn't be better to just do full backups. Aside from the time-to-completion, overall storage requirements may be the same and would in most cases speed recovery. ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Alex Paschal Sent: Friday, January 11, 2013 11:57 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates I second Wanda on the logs. When you think about it, logs are unique data, being entirely made of transactions in the order in which they come in. If they were identical to some other data, I'd start looking around for Twighlight Zone cameras. On the other hand, I suppose I could imagine a test harness issuing the exact same set of transactions to a test system multiple times. On 1/11/2013 7:21 AM, Prather, Wanda wrote: Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +- +- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +- +-
Re: Deduplication candidates
Yes, I agree with you. I can't think of a reason why most of the database shouldn't dedup out. On 1/11/2013 11:03 AM, Rick Adamson wrote: Thanks Wanda and Alex, Yes I too thought about the uniqueness of the data that makes up logs. I guess I'm just second guessing myself. One approach I am thing about in regard to the same issue with pre Exchange 2010 log files (legacy incrementals) is if it wouldn't be better to just do full backups. Aside from the time-to-completion, overall storage requirements may be the same and would in most cases speed recovery. ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Alex Paschal Sent: Friday, January 11, 2013 11:57 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates I second Wanda on the logs. When you think about it, logs are unique data, being entirely made of transactions in the order in which they come in. If they were identical to some other data, I'd start looking around for Twighlight Zone cameras. On the other hand, I suppose I could imagine a test harness issuing the exact same set of transactions to a test system multiple times. On 1/11/2013 7:21 AM, Prather, Wanda wrote: Yep. Oracle DB's, getting great dedup rates on the DB's (except the ones where they have turned on Oracle compression to start with - that is, the DB itself is compressed). Poor dedup on the Oracle logs either way. W -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick Adamson Sent: Friday, January 11, 2013 8:44 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication candidates Though our TSM systems (6.3 and 5.5) use back-end de-dup, data domain, I also notice that log files for DB's such as Exchange pre 2010 using legacy backups and DB2 log files de-dup very poorly. Originally I thought that our DBA's or Exchange admins were either compressing this data or storing it on compressed volumes but I found no evidence of it. After seeing this conversation and giving it further thought I wonder if others experience poor de-dup rates on these data types? Thanks ~Rick -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of bkupmstr Sent: Friday, January 11, 2013 7:15 AM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication candidates Thomas, First off, with all the great enhancehancements and current high stability levels I would recommend going straight to version 6.4 As you have already stated there are certain data types hat are good candidates for data deduplication and your database backup data definitely is and image files definitely aren't. From my experience oracle export files are traditionally good dedupe candidates also. From what you describe, the SQL backup data minus the compression would also be a good candidate. The one thing you do not mention is how many versions of this backup data you are keeping? From my experience, unless you are keeping a minimum of 4 backup versions, the dedupe ratios will suffer. Too many time I see folks keeping only 2 backup versions nd they can't understand why they get very poor dedup rates Also be aware that with TSM deduplication you will have to ensure that you write the backup data to a target disk pool that will have good enough performance to not negatively impact backup speed. +- +- |This was sent by bkupm...@yahoo.com via Backup Central. |Forward SPAM to ab...@backupcentral.com. +- +-
Re: Deduplication with TSM.
The size of storage is not enough information to size a system. The number of sessions determines system size. If you have four clients, 1 gig per night, you could run 8GB RAM, Core2 2GHz and be okay. Realistically, 32GB per instance is good. db2sysc will use about 20GB per instance if it's available. This handles a couple hundred clients, deduplication, etc. As for processors, one processor for every 2 DB directories, plus one processor for TSM internals is minimal. If you will have high I/O, then one hardware thread for every IDENTIFY DUPLICATES process is good. If you will use client-side dedupe most of the time, then you you only use IDENTIFY DUPLICATES when you move data into the pool server side. Higher GHz matters for the identify processes, though branch prediction is still important (POWER5 or POWER7 are better than POWER6) Higher hardware thread counts matter for client session responsiveness (POWER7 uses fewer cores than POWER6/POWER5) Higher I/O backplane matters for amount of raw data coming in (Low end POWER wins over low-end Intel) Higher IOPS for the DB volumes are necessary to keep clients from slowing down. (hash compares) Lower network latency matters for client performance (hash compares) With friendly Regards, Josh-Daniel S. Davis OmniTech Industries On Thu, Apr 26, 2012 at 8:43 AM, Francisco Molero fmol...@yahoo.com wrote: Hi colleagues, I am going to implement a very big disk pool with dedup around 100 TB. TSM disk Storage pool ( neither VTLs nor DataDomain) . Somebody knows what TSM server I need ( RAM and CPU) or what ratio can I hope... I am thinking about source Dedup... Any experiences? Thanks..
Re: Deduplication question
On 06/09/11 22:40, Richard van Denzel wrote: Hi All, Just a question aboutn the internal dedup of TSM. When I dedup a storage pool and then backup the pool to a dedup copy pool, will the data in the storage pool backup be transferred deduped or will it get undeduped first, then transferred and then dedupped again? Richard. Richard, The chapter 'managing storage pools and volumes' in the TSM 6.1 Admin Guide has a decent section on deduplicating data. We are investigating server- and client-side dedupe but we haven't tried what you want to do in our test environment. However, the above document suggests in two places that the data will be copied/moved in its deduplicated 'state' rather than being rebuilt, copied and deduplicated again. Ian Smith Oxford University
Re: Deduplication question
Hi Richard, No, the deduplicated data is not recomposed when backing up to a deduplicated copy storage pool. Recommended reading: http://www.ibm.com/developerworks/wikis/pages/viewpage.action?pageId=108134649 http://www.ibm.com/developerworks/wikis/display/tivolistoragemanager/Data +deduplication+best+practices+for+Tivoli+Storage+Manager+V6.2 Best regards, Andy Raibeck IBM Software Group Tivoli Storage Manager Client Product Development Level 3 Team Lead Internal Notes e-mail: Andrew Raibeck/Hartford/IBM@IBMUS Internet e-mail: stor...@us.ibm.com IBM Tivoli Storage Manager support web page: http://www.ibm.com/support/entry/portal/Overview/Software/Tivoli/Tivoli_Storage_Manager ADSM: Dist Stor Manager ADSM-L@vm.marist.edu wrote on 2011-09-06 17:40:49: From: Richard van Denzel rvanden...@gmail.com To: ADSM-L@vm.marist.edu Date: 2011-09-06 17:44 Subject: Deduplication question Sent by: ADSM: Dist Stor Manager ADSM-L@vm.marist.edu Hi All, Just a question aboutn the internal dedup of TSM. When I dedup a storage pool and then backup the pool to a dedup copy pool, will the data in the storage pool backup be transferred deduped or will it get undeduped first, then transferred and then dedupped again? Richard.
Re: Deduplication and Collocation
Back to client side dedupe, which we're about to deploy for a branch campus 90 miles away in Rockford IL. The data is sent from the clients in Rockford via tin cans and string to the TSM server in Chicago already dedpued. We're using source dedupe because the network bandwidth is somewhat limited. So if it is received into a DEVCLASS DISK stgpool, then I assume it is still deduped, because that's how it arrived. Then finally when it's migrated to tape, we've already established that it gets reinflated, and then you can collocate or not as you wish. But the question is, does this imply that deduped data CAN exist in random access DEVCLASS DISK stgpools if client-side dedupe is being used? I sure hope so, because that's what we're planning to do. Roger Deschner University of Illinois at Chicago rog...@uic.edu == You will finish your project ahead of schedule. === = (Best fortune-cookie fortune ever.) == On Tue, 21 Jun 2011, Paul Zarnowski wrote: Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu
Re: Deduplication and Collocation
As far as I know client site de-duplication will not work with primary storage pool DISK. It must be FILE as well like for server site de-duplication. Am I right? Grigori G. Solonovitch -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Roger Deschner Sent: Wednesday, June 22, 2011 9:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Back to client side dedupe, which we're about to deploy for a branch campus 90 miles away in Rockford IL. The data is sent from the clients in Rockford via tin cans and string to the TSM server in Chicago already dedpued. We're using source dedupe because the network bandwidth is somewhat limited. So if it is received into a DEVCLASS DISK stgpool, then I assume it is still deduped, because that's how it arrived. Then finally when it's migrated to tape, we've already established that it gets reinflated, and then you can collocate or not as you wish. But the question is, does this imply that deduped data CAN exist in random access DEVCLASS DISK stgpools if client-side dedupe is being used? I sure hope so, because that's what we're planning to do. Roger Deschner University of Illinois at Chicago rog...@uic.edu == You will finish your project ahead of schedule. === = (Best fortune-cookie fortune ever.) == On Tue, 21 Jun 2011, Paul Zarnowski wrote: Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu Please consider the environment before printing this Email. CONFIDENTIALITY AND WAIVER: The information contained in this electronic mail message and any attachments hereto may be legally privileged and confidential. The information is intended only for the recipient(s) named in this message. If you are not the intended recipient you are notified that any use, disclosure, copying or distribution is prohibited. If you have received this in error please contact the sender and delete this message and any attachments from your computer system. We do not guarantee that this message or any attachment to it is secure or free from errors, computer viruses or other conditions that may damage or interfere with data, hardware or software.
Re: Deduplication and Collocation
This is my understanding as well. I'm almost certain this is the case, though we have not yet used source dedup. ..Paul On Jun 22, 2011, at 3:34 AM, Grigori Solonovitch grigori.solonovi...@ahliunited.com wrote: As far as I know client site de-duplication will not work with primary storage pool DISK. It must be FILE as well like for server site de-duplication. Am I right? Grigori G. Solonovitch -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Roger Deschner Sent: Wednesday, June 22, 2011 9:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Back to client side dedupe, which we're about to deploy for a branch campus 90 miles away in Rockford IL. The data is sent from the clients in Rockford via tin cans and string to the TSM server in Chicago already dedpued. We're using source dedupe because the network bandwidth is somewhat limited. So if it is received into a DEVCLASS DISK stgpool, then I assume it is still deduped, because that's how it arrived. Then finally when it's migrated to tape, we've already established that it gets reinflated, and then you can collocate or not as you wish. But the question is, does this imply that deduped data CAN exist in random access DEVCLASS DISK stgpools if client-side dedupe is being used? I sure hope so, because that's what we're planning to do. Roger Deschner University of Illinois at Chicago rog...@uic.edu == You will finish your project ahead of schedule. === = (Best fortune-cookie fortune ever.) == On Tue, 21 Jun 2011, Paul Zarnowski wrote: Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu Please consider the environment before printing this Email. CONFIDENTIALITY AND WAIVER: The information contained in this electronic mail message and any attachments hereto may be legally privileged and confidential. The information is intended only for the recipient(s) named in this message. If you are not the intended recipient you are notified that any use, disclosure, copying or distribution is prohibited. If you have received this in error please contact the sender and delete this message and any attachments from your computer system. We do not guarantee that this message or any attachment to it is secure or free from errors, computer viruses or other conditions that may damage or interfere with data, hardware or software.
Re: Deduplication and Collocation
Agreed. AFAIK, the client-side dedup function is reliant on the dedup information in the storage pool where the data resides on the server. Which has to be a file pool, and deduped. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Paul Zarnowski Sent: Wednesday, June 22, 2011 7:01 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation This is my understanding as well. I'm almost certain this is the case, though we have not yet used source dedup. ..Paul On Jun 22, 2011, at 3:34 AM, Grigori Solonovitch grigori.solonovi...@ahliunited.com wrote: As far as I know client site de-duplication will not work with primary storage pool DISK. It must be FILE as well like for server site de-duplication. Am I right? Grigori G. Solonovitch -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Roger Deschner Sent: Wednesday, June 22, 2011 9:37 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Back to client side dedupe, which we're about to deploy for a branch campus 90 miles away in Rockford IL. The data is sent from the clients in Rockford via tin cans and string to the TSM server in Chicago already dedpued. We're using source dedupe because the network bandwidth is somewhat limited. So if it is received into a DEVCLASS DISK stgpool, then I assume it is still deduped, because that's how it arrived. Then finally when it's migrated to tape, we've already established that it gets reinflated, and then you can collocate or not as you wish. But the question is, does this imply that deduped data CAN exist in random access DEVCLASS DISK stgpools if client-side dedupe is being used? I sure hope so, because that's what we're planning to do. Roger Deschner University of Illinois at Chicago rog...@uic.edu == You will finish your project ahead of schedule. === = (Best fortune-cookie fortune ever.) == On Tue, 21 Jun 2011, Paul Zarnowski wrote: Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu Please consider the environment before printing this Email. CONFIDENTIALITY AND WAIVER: The information contained in this electronic mail message and any attachments hereto may be legally privileged and confidential. The information is intended only for the recipient(s) named in this message. If you are not the intended recipient you are notified that any use, disclosure, copying or distribution is prohibited. If you have received this in error please contact the sender and delete this message and any attachments from your computer system. We do not guarantee that this message or any attachment to it is secure or free from errors, computer viruses or other conditions that may damage or interfere with data, hardware or software.
Re: Deduplication and Collocation
Client side dedup is only done to a dedup storagepool which means the storagepool has to be a FILE type storagepool. Roger Deschner rog...@uic.edu 6/22/2011 2:37 AM Back to client side dedupe, which we're about to deploy for a branch campus 90 miles away in Rockford IL. The data is sent from the clients in Rockford via tin cans and string to the TSM server in Chicago already dedpued. We're using source dedupe because the network bandwidth is somewhat limited. So if it is received into a DEVCLASS DISK stgpool, then I assume it is still deduped, because that's how it arrived. Then finally when it's migrated to tape, we've already established that it gets reinflated, and then you can collocate or not as you wish. But the question is, does this imply that deduped data CAN exist in random access DEVCLASS DISK stgpools if client-side dedupe is being used? I sure hope so, because that's what we're planning to do. Roger Deschner University of Illinois at Chicago rog...@uic.edu == You will finish your project ahead of schedule. === = (Best fortune-cookie fortune ever.) == On Tue, 21 Jun 2011, Paul Zarnowski wrote: Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu
Re: Deduplication and Collocation
Tape pools are not de-duped, so that is not a consideration. On Tue, Jun 21, 2011 at 13:17, Mark Mooney mmoo...@aisconsulting.net wrote: Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication and Collocation
Doesn't it undup when it goes to tape? Or am I still living in 5.5 and thinking in VTL dedup? -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney IMPORTANT: E-mail sent through the Internet is not secure. Legg Mason therefore recommends that you do not send any confidential or sensitive information to us via electronic mail, including social security numbers, account numbers, or personal identification numbers. Delivery, and or timely delivery of Internet mail is not guaranteed. Legg Mason therefore recommends that you do not send time sensitive or action-oriented messages to us via electronic mail. This message is intended for the addressee only and may contain privileged or confidential information. Unless you are the intended recipient, you may not use, copy or disclose to anyone any information contained in this message. If you have received this message in error, please notify the author by replying to this message and then kindly delete the message. Thank you.
Re: Deduplication and Collocation
So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Andrew Carlson Sent: Tuesday, June 21, 2011 8:22 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication and Collocation Tape pools are not de-duped, so that is not a consideration. On Tue, Jun 21, 2011 at 13:17, Mark Mooney mmoo...@aisconsulting.net wrote: Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication and Collocation
Dedup only works in TSM storage pools that reside on disk (specifically devtype=FILE pools). If you have data that goes to a dedup pool, then gets migrated off to tape, it is reduped (rehydrated, reinflated, whatever you want to call it.) So collocation will still be in effect for that pool. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney
Re: Deduplication and Collocation
If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Andrew Carlson Sent: Tuesday, June 21, 2011 8:22 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication and Collocation Tape pools are not de-duped, so that is not a consideration. On Tue, Jun 21, 2011 at 13:17, Mark Mooney mmoo...@aisconsulting.net wrote: Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication and Collocation
And that's why storage pool planning is very important. The less re-duping, hydrating, inflating you do the better. Client data to a non-deduped (I guess that would be a duped) pool that migrates to a deduped pool. But backup stgpool before the migration happens to avoid the re. This is where I expect I'll be corrected: as long as the backup stg happens before the deduplication process on the file devtype storage pool the reduping won't have to happen. (we weren't really talking about collocated copy pools were we?) But then you wouldn't have a file devtype pool migrating to tape very often anyway would you? And if you did, that would only be in an emergency situation (i.e., you ran out of room on disk). And in that case why would you collocate? Ah, the words of someone that used to think he knew what he was talking about! Kelly J. Lipp Elbert Colorado 719-531-5574 -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Tuesday, June 21, 2011 12:27 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Dedup only works in TSM storage pools that reside on disk (specifically devtype=FILE pools). If you have data that goes to a dedup pool, then gets migrated off to tape, it is reduped (rehydrated, reinflated, whatever you want to call it.) So collocation will still be in effect for that pool. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney
Re: Deduplication and Collocation
Cool, Thanks :) I have questions about client dedup. Do you know of any redbook detail on that? Thanks, Mooney Prather, Wanda wprat...@icfi.com wrote: Dedup only works in TSM storage pools that reside on disk (specifically devtype=FILE pools). If you have data that goes to a dedup pool, then gets migrated off to tape, it is reduped (rehydrated, reinflated, whatever you want to call it.) So collocation will still be in effect for that pool. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney
Re: Deduplication and Collocation
https://www-304.ibm.com/support/docview.wss?context=SSGSG7lang=allrs=2077wv=1loc=en_UScs=UTF-8uid=swg27018576q1=tste_webcastdc=DA410 -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:53 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Cool, Thanks :) I have questions about client dedup. Do you know of any redbook detail on that? Thanks, Mooney Prather, Wanda wprat...@icfi.com wrote: Dedup only works in TSM storage pools that reside on disk (specifically devtype=FILE pools). If you have data that goes to a dedup pool, then gets migrated off to tape, it is reduped (rehydrated, reinflated, whatever you want to call it.) So collocation will still be in effect for that pool. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney
Re: Deduplication and Collocation
Even if a FILE devclass has dedup turned on, when the data is migrated, reclaimed, or backed up (backup stgpool) to tape, then the files are reconstructed from their pieces. You cannot dedup on DISK stgpools. DISK implies random access disk - e.g., devclass DISK. FILE implies serial access disk - e.g., devclass FILE. But I think there is still an open question about collocation and deduplication. Deduplication must be done using FILE stgpools, but FILE stgpools CAN use collocation. I don't know what happens in this case. ..Paul At 02:38 PM 6/21/2011, Prather, Wanda wrote: If it is a file device class with dedup turned off, yes. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:29 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation So data is deduplicated in a disk storage pool but when it is written to tape the entire reconstructed file is written out? Is this the same for file device classes? -- Paul ZarnowskiPh: 607-255-4757 Manager, Storage Services Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu
Re: Deduplication and Collocation
Thank you Wanda! Much Appreciated! -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Tuesday, June 21, 2011 9:09 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication and Collocation https://www-304.ibm.com/support/docview.wss?context=SSGSG7lang=allrs=2077wv=1loc=en_UScs=UTF-8uid=swg27018576q1=tste_webcastdc=DA410 -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:53 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication and Collocation Cool, Thanks :) I have questions about client dedup. Do you know of any redbook detail on that? Thanks, Mooney Prather, Wanda wprat...@icfi.com wrote: Dedup only works in TSM storage pools that reside on disk (specifically devtype=FILE pools). If you have data that goes to a dedup pool, then gets migrated off to tape, it is reduped (rehydrated, reinflated, whatever you want to call it.) So collocation will still be in effect for that pool. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Mark Mooney Sent: Tuesday, June 21, 2011 2:17 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication and Collocation Hello, I had a student ask me today What happens if you have collocation turned on for a storage pool that you are deduplicating? I did not know what to answer because in my mind I thought well, if the data is collocated then I need to have a copy of that data on that client's tape, otherwise I am going to be mounting another client's tape to get back a de-duped piece of data which would negate the collocation I'm looking at the redbooks for this but I only see 6.1 and in 6.2 they added client side dedup as well (which I also have questions about) Can anyone shed some light on this? Thanks! Mooney
Re: Deduplication Question
Check the MOUNTLIMIT in the client definition. It controls how many mount points in a sequential pool the client can use at once. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Jim Neal Sent: Thursday, March 10, 2011 4:16 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication Question Importance: High Hi All, I am testing Deduplication on a Red Hat 5 , 64 Bit, X86 Linux Server, using TSM Server version 6.2.2.2. I set up a storage pool for Deduplication and using a file devclass. When I try to back up a windows client to the storage pool, I am getting an error that says ANR0525W - Server Media mount not possible. However, when I back up to a standard storage pool with a disk devclass, the same client backs up perfectly and then the data can be migrated to the deduplicated storage pool. My questions are these: 1) Can you back up a client directly to a deduplicated storage pool? If so, how? 2) What criteria should be used for the creation of volumes for a deduplicated storage pool? 3) Is there documentation that specifically says that you have to migrate data to the deduplicated storage pool? Any insight on these issues will be greatly appreciated! Thanks! Jim Neal Sr. TSM Administrator U.C. Berkeley Storage and Backup Group
Re: Deduplication Question
Thanks Wanda! That worked perfectly! I owe you one! Jim -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Thursday, March 10, 2011 1:20 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication Question Check the MOUNTLIMIT in the client definition. It controls how many mount points in a sequential pool the client can use at once. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Jim Neal Sent: Thursday, March 10, 2011 4:16 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication Question Importance: High Hi All, I am testing Deduplication on a Red Hat 5 , 64 Bit, X86 Linux Server, using TSM Server version 6.2.2.2. I set up a storage pool for Deduplication and using a file devclass. When I try to back up a windows client to the storage pool, I am getting an error that says ANR0525W - Server Media mount not possible. However, when I back up to a standard storage pool with a disk devclass, the same client backs up perfectly and then the data can be migrated to the deduplicated storage pool. My questions are these: 1) Can you back up a client directly to a deduplicated storage pool? If so, how? 2) What criteria should be used for the creation of volumes for a deduplicated storage pool? 3) Is there documentation that specifically says that you have to migrate data to the deduplicated storage pool? Any insight on these issues will be greatly appreciated! Thanks! Jim Neal Sr. TSM Administrator U.C. Berkeley Storage and Backup Group
Re: Deduplication Question
You're welcome. Been there, done that, got the scars to prove it! ;) -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Jim Neal Sent: Thursday, March 10, 2011 4:44 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication Question Importance: High Thanks Wanda! That worked perfectly! I owe you one! Jim -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Prather, Wanda Sent: Thursday, March 10, 2011 1:20 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: [ADSM-L] Deduplication Question Check the MOUNTLIMIT in the client definition. It controls how many mount points in a sequential pool the client can use at once. -Original Message- From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Jim Neal Sent: Thursday, March 10, 2011 4:16 PM To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Deduplication Question Importance: High Hi All, I am testing Deduplication on a Red Hat 5 , 64 Bit, X86 Linux Server, using TSM Server version 6.2.2.2. I set up a storage pool for Deduplication and using a file devclass. When I try to back up a windows client to the storage pool, I am getting an error that says ANR0525W - Server Media mount not possible. However, when I back up to a standard storage pool with a disk devclass, the same client backs up perfectly and then the data can be migrated to the deduplicated storage pool. My questions are these: 1) Can you back up a client directly to a deduplicated storage pool? If so, how? 2) What criteria should be used for the creation of volumes for a deduplicated storage pool? 3) Is there documentation that specifically says that you have to migrate data to the deduplicated storage pool? Any insight on these issues will be greatly appreciated! Thanks! Jim Neal Sr. TSM Administrator U.C. Berkeley Storage and Backup Group
Re: Deduplication Status
Hi Andy, Are you doing server- or client-side deduplication? What are the versions of your TSM Client and Server? Regards, Mark L. Yakushev From: Andrew Carlson naclos...@gmail.com To: ADSM-L@vm.marist.edu Date: 04/21/2010 12:36 PM Subject:[ADSM-L] Deduplication Status I have been looking through the commands and outputs of commands, trying to find something to tell me how much deduplication has occurred. Is there one there I am missing? Thanks. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication Status
Server side dedup, Server V6.2, client V6.2. On Wed, Apr 21, 2010 at 2:39 PM, Mark Yakushev bar...@us.ibm.com wrote: Hi Andy, Are you doing server- or client-side deduplication? What are the versions of your TSM Client and Server? Regards, Mark L. Yakushev From: Andrew Carlson naclos...@gmail.com To: ADSM-L@vm.marist.edu Date: 04/21/2010 12:36 PM Subject: [ADSM-L] Deduplication Status I have been looking through the commands and outputs of commands, trying to find something to tell me how much deduplication has occurred. Is there one there I am missing? Thanks. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication Status
Hi Andy, there are 2 sources for this information. A column in the stgpools table has the MB saved - tsm: select cast(stgpool_name as char(20)) as Name, - cast(space_saved_mb / 1024.0 / 1024.0 as decimal(6,2)) as T Saved from stgpools Name T Saved - - BKP_0 BKP_1A0.00 BKP_1B0.00 BKP_2 24.38 Or 'q stg f=d' will show it - tsm: q stg bkp_2 f=d Storage Pool Name: BKP_2 Storage Pool Type: Primary Device Class Name: VT01_50GB Estimated Capacity: 50,775 G ... ... ... Deduplicate Data?: Yes Processes For Identifying Duplicates: 0 Duplicate Data Not Stored: 24,972 G (56%) Hope this helps, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Andrew Carlson Sent: Wednesday, April 21, 2010 4:13 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication Status Server side dedup, Server V6.2, client V6.2. On Wed, Apr 21, 2010 at 2:39 PM, Mark Yakushev bar...@us.ibm.com wrote: Hi Andy, Are you doing server- or client-side deduplication? What are the versions of your TSM Client and Server? Regards, Mark L. Yakushev From: Andrew Carlson naclos...@gmail.com To: ADSM-L@vm.marist.edu Date: 04/21/2010 12:36 PM Subject: [ADSM-L] Deduplication Status I have been looking through the commands and outputs of commands, trying to find something to tell me how much deduplication has occurred. Is there one there I am missing? Thanks. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.
Re: Deduplication Status
Thanks for all the answers. I finally figured out why I wasn't seeing anything . . . it helps to read everything . . . I didn't realize the the duplicate data is not released until reclamation processing. Thanks all. On Wed, Apr 21, 2010 at 4:40 PM, Colwell, William F. bcolw...@draper.com wrote: Hi Andy, there are 2 sources for this information. A column in the stgpools table has the MB saved - tsm: select cast(stgpool_name as char(20)) as Name, - cast(space_saved_mb / 1024.0 / 1024.0 as decimal(6,2)) as T Saved from stgpools Name T Saved - - BKP_0 BKP_1A 0.00 BKP_1B 0.00 BKP_2 24.38 Or 'q stg f=d' will show it - tsm: q stg bkp_2 f=d Storage Pool Name: BKP_2 Storage Pool Type: Primary Device Class Name: VT01_50GB Estimated Capacity: 50,775 G ... ... ... Deduplicate Data?: Yes Processes For Identifying Duplicates: 0 Duplicate Data Not Stored: 24,972 G (56%) Hope this helps, Bill Colwell Draper Lab -Original Message- From: ADSM: Dist Stor Manager [mailto:ads...@vm.marist.edu] On Behalf Of Andrew Carlson Sent: Wednesday, April 21, 2010 4:13 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Deduplication Status Server side dedup, Server V6.2, client V6.2. On Wed, Apr 21, 2010 at 2:39 PM, Mark Yakushev bar...@us.ibm.com wrote: Hi Andy, Are you doing server- or client-side deduplication? What are the versions of your TSM Client and Server? Regards, Mark L. Yakushev From: Andrew Carlson naclos...@gmail.com To: ADSM-L@vm.marist.edu Date: 04/21/2010 12:36 PM Subject: [ADSM-L] Deduplication Status I have been looking through the commands and outputs of commands, trying to find something to tell me how much deduplication has occurred. Is there one there I am missing? Thanks. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless. -- Andy Carlson --- Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month, The feeling of seeing the red box with the item you want in it:Priceless.