Re: deletion performance of large deduplicated files

2019-07-19 Thread Rick Adamson
Appreciate the input Del, thanks !

Thank you,
-Rick Adamson


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Del Hoobler
Sent: Friday, July 19, 2019 2:32 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] deletion performance of large deduplicated files

* This email originated outside of the organization. Use caution when opening 
attachments or clicking links. *

--
Hi Eric and all,

We are aware of this situation and putting additional focus and attention on it.


Del



"ADSM: Dist Stor Manager"  wrote on 07/19/2019
09:35:37 AM:

> From: "Loon, Eric van (ITOP NS) - KLM" 
> To: ADSM-L@VM.MARIST.EDU
> Date: 07/19/2019 09:36 AM
> Subject: [EXTERNAL] Re: deletion performance of large deduplicated files
> Sent by: "ADSM: Dist Stor Manager" 
>
> Hi Rick and others!
>
> I replicated the data of the test TDP client to multiple servers,
> running 7.1.7, 7.1.9 and even 8.1.8: the performance sucks on all
servers.
> We do not use client replication as part of our server protection.
> We need a real time replication over the datacenter and thus we rely
> on host-based replication for all our servers. I tested with and
> without replication: there is no noticeable difference.
> As a work-around I will install a non-deduplicated file stgpool on
> the worst performing server next week. I expect the server to
> perform better as soon as all container pool objects are expired, in
> about 3 weeks from now.
> In the meantime I will keep on pursuing IBM until it is fixed or
> else we might need to replace the product altogether...
>
> Kind regards,
> Eric van Loon
> Air France/KLM Storage & Backup
>
> -Original Message-
> From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On
> Behalf Of Rick Adamson
> Sent: vrijdag 19 juli 2019 14:45
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: deletion performance of large deduplicated files
>
> Eric, Michael,
> I have been working through similar struggles and as I read your
> posts had to wonder, can you provide some details on your server and
> client versions?
> Basically I now have experience/exposure to every version/
> maintenance pack/patch SP has put out since 7.1.3.x
>
> My servers are now running on  8.1.8 (windows) and has seemed to
> have stabilized (for now).
> One thing that was causing me a lot of grief was using directory
> storage pools with client side deduplication enabled, particularly
> on data protection products (all of them).
> Afterwards there was a lot of cleanup; auditing containers,
> confirming that protect storage was completing successfully, and
> finally doing the same for replicate node processes.
>
> I found that understandably server performance takes a severe nose
> dive if you are trying to process (protect/replicate) damaged
> containers, and most likely restores will be compromised as well.
>
>
> Thank you,
> -Rick Adamson
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of
Michael Prix
> Sent: Friday, July 19, 2019 6:11 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] deletion performance of large deduplicated files
>
> * This email originated outside of the organization. Use caution
> when opening attachments or clicking links. *
>
> --
> Hello Eric,
>
>
>
>   welcome to my nightmares. Take a seat, wanna have a drink?
>
>
>
> I had the pleasure of performance and data corruption PMRs during the
last two
>
> years with TDP Oracle. Yes, at first the customer got blamed for not
adhering
>
> completely to to blueprints, but after some weeks it boild down to  ...
>
> silence.
>
> Data corruption was because of what ended in IT28096 - now fixed.
>
> Performance is interesting, but resembles to what you have written. We
work
>
> with MAXPIECESIZE settings on RMAN to keep the backup pieces small and
got
>
> some interesting values, pending further observation, but we might be on
a
>
> cheerful way. I'm talking about database sizes of 50TB here, warehouse
style.
>
>   In between we moved the big DBs to a dedicated server to prove that
the
>
> performance drop is because of the big DBs, and the remaining "small"
DBs  -
>
> size of 500MB up to 5TB - didn't put any measurable stress on the DB in
terms
>
> of expiration and protect stgpool. Even the big DBs on their dedicated
server
>
> performed better in terms of expiration and protect stgpool, which might
have
>
> been a coincidence of these DBs holding nearly the same data and having
the
>
> same retention period.
>
>
>
> What I can't observe is a slowness of the DB. Queries are answered in
the
>
> normal time - depending on the query. a count(*) from backupobjects
naturally
>
> takes some time, considerably longer when you use dedup, but the daily
queries
>
> are answered in the "normal" timeframe.
>
>
>
> What helped immediately was some tuning:
>
> 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Del Hoobler
Hi Eric and all,

We are aware of this situation and putting additional focus and attention 
on it.


Del



"ADSM: Dist Stor Manager"  wrote on 07/19/2019 
09:35:37 AM:

> From: "Loon, Eric van (ITOP NS) - KLM" 
> To: ADSM-L@VM.MARIST.EDU
> Date: 07/19/2019 09:36 AM
> Subject: [EXTERNAL] Re: deletion performance of large deduplicated files
> Sent by: "ADSM: Dist Stor Manager" 
> 
> Hi Rick and others!
> 
> I replicated the data of the test TDP client to multiple servers, 
> running 7.1.7, 7.1.9 and even 8.1.8: the performance sucks on all 
servers.
> We do not use client replication as part of our server protection. 
> We need a real time replication over the datacenter and thus we rely
> on host-based replication for all our servers. I tested with and 
> without replication: there is no noticeable difference.
> As a work-around I will install a non-deduplicated file stgpool on 
> the worst performing server next week. I expect the server to 
> perform better as soon as all container pool objects are expired, in
> about 3 weeks from now.
> In the meantime I will keep on pursuing IBM until it is fixed or 
> else we might need to replace the product altogether...
> 
> Kind regards,
> Eric van Loon
> Air France/KLM Storage & Backup
> 
> -Original Message-
> From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On 
> Behalf Of Rick Adamson
> Sent: vrijdag 19 juli 2019 14:45
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: deletion performance of large deduplicated files
> 
> Eric, Michael,
> I have been working through similar struggles and as I read your 
> posts had to wonder, can you provide some details on your server and
> client versions?
> Basically I now have experience/exposure to every version/
> maintenance pack/patch SP has put out since 7.1.3.x
> 
> My servers are now running on  8.1.8 (windows) and has seemed to 
> have stabilized (for now).
> One thing that was causing me a lot of grief was using directory 
> storage pools with client side deduplication enabled, particularly 
> on data protection products (all of them).
> Afterwards there was a lot of cleanup; auditing containers, 
> confirming that protect storage was completing successfully, and 
> finally doing the same for replicate node processes.
> 
> I found that understandably server performance takes a severe nose 
> dive if you are trying to process (protect/replicate) damaged 
> containers, and most likely restores will be compromised as well.
> 
> 
> Thank you,
> -Rick Adamson
> 
> 
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of 
Michael Prix
> Sent: Friday, July 19, 2019 6:11 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] deletion performance of large deduplicated files
> 
> * This email originated outside of the organization. Use caution 
> when opening attachments or clicking links. *
> 
> --
> Hello Eric,
> 
> 
> 
>   welcome to my nightmares. Take a seat, wanna have a drink?
> 
> 
> 
> I had the pleasure of performance and data corruption PMRs during the 
last two
> 
> years with TDP Oracle. Yes, at first the customer got blamed for not 
adhering
> 
> completely to to blueprints, but after some weeks it boild down to  ...
> 
> silence.
> 
> Data corruption was because of what ended in IT28096 - now fixed.
> 
> Performance is interesting, but resembles to what you have written. We 
work
> 
> with MAXPIECESIZE settings on RMAN to keep the backup pieces small and 
got
> 
> some interesting values, pending further observation, but we might be on 
a
> 
> cheerful way. I'm talking about database sizes of 50TB here, warehouse 
style.
> 
>   In between we moved the big DBs to a dedicated server to prove that 
the
> 
> performance drop is because of the big DBs, and the remaining "small" 
DBs  -
> 
> size of 500MB up to 5TB - didn't put any measurable stress on the DB in 
terms
> 
> of expiration and protect stgpool. Even the big DBs on their dedicated 
server
> 
> performed better in terms of expiration and protect stgpool, which might 
have
> 
> been a coincidence of these DBs holding nearly the same data and having 
the
> 
> same retention period.
> 
> 
> 
> What I can't observe is a slowness of the DB. Queries are answered in 
the
> 
> normal time - depending on the query. a count(*) from backupobjects 
naturally
> 
> takes some time, considerably longer when you use dedup, but the daily 
queries
> 
> are answered in the "normal" timeframe.
> 
> 
> 
> What helped immediately was some tuning:
> 
> - More LUNS and filesystems for the TSM-DB
> 
> - smaller disks, but more of them, for each filesystem.
> 
>   changing the disks from 100GB to 2 x 50GB for each DB-filesystem got 
me a
> 
> performance boost of 200% in expiration and backup db. Unbelievable,but 
true.
> 
> Yes, I'm using SSD. And SVC. And multiple storage systems. Performance 
isn't
> 
> the problem, we are 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Kizzire, Chris
OK. Thanks. Michael. We'll give it a try.

Chris Kizzire
Backup Administrator (Network Engineer II)

BROOKWOOD BAPTIST HEALTH
Information Systems
O:   205.820.5973

chris.kizz...@bhsala.com
BROOKWOODBAPTISTHEALTH.COM


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Friday, July 19, 2019 10:14 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] deletion performance of large deduplicated files


CAUTION: ***EXTERNAL EMAIL*** Do NOT click links or open attachments unless you 
recognize the sender and know the content is safe. If you are unsure, please 
use PhishAlarm to report suspicious emails.

Chris,

  yours is easy to answer: You have a performance problem, not a TSM problem.
For VM-backup I have a dedicated LPAR in a S842, 3vCPU, 128GB RAM, SR-IOV 10GB. 
Storage is a V7000, SSD/SAS. There has never been a backup or restore problem 
performance wise, everything is running with wire-speed.
 The only problem we hsd was client side dedup, II28096 hit us bad, 5 days of 
audit for each TSM-instance over and over again until the reason was identified.

For a start: keep the size of hdisk small for the db. Better 2 x 50GB that 1 
100GB. chfs -e x and reorgvg is your friend. mount option "rbrw,noatime" is of 
great use for db, log, arc and stg pools. Check you queue_depth and set it to 
max.

As for SQL: use as many streams as the client can handle. CPU is there to 
handle load, not to waste energy by idleing.

--
Michael Prix

On Fri, 2019-07-19 at 13:02 +, Kizzire, Chris wrote:
> Alas... we are not the only ones I knew it...
> We went from TSM 6.3 to SP 8.1.4.0. Performance is 5 times slower in 
> the new Environment overall. We use Container Pools, Dedup, & 
> Compression. We use SP for VE for most VM's. Baclient & SQL for physical 
> machines & vm's w/ SQL.
> It took about 17 hours to restore 4.5TB SQL DB the other day.
> Main Server:
>   IBM Power 750 running AIX 7.2 .
>   3.1TB DB is on SSD.
>   128GB RAM
> Server at DR site
>   IBM Power 770 running AIX 7.2
>   3.1TB DB NOT on SSD
>   100ish GB RAM
> Container Pool is on 1.5 year old IBM V5030 w/ Identical V5030 at DR 
> site.
>
> IBM says open a Performance PMR for AIX- which we have yet to do.
> Protect Stgpool runs for days & we have to cancel because we get too 
> far behind on Replication. If we are lucky we might can replicate 12TB 
> in a 24 hour period w/ 100 sessions (maxsessions=50)
>
>
> Chris Kizzire
> Backup Administrator (Network Engineer II)
>
> BROOKWOOD BAPTIST HEALTH
> Information Systems
> O:   205.820.5973
>
> chris.kizz...@bhsala.com
> BROOKWOODBAPTISTHEALTH.COM
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of 
> Michael Prix
> Sent: Friday, July 19, 2019 5:11 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] deletion performance of large deduplicated files
>
>
> CAUTION: ***EXTERNAL EMAIL*** Do NOT click links or open attachments 
> unless you recognize the sender and know the content is safe. If you 
> are unsure, please use PhishAlarm to report suspicious emails.
>
> Hello Eric,
>
>   welcome to my nightmares. Take a seat, wanna have a drink?
>
> I had the pleasure of performance and data corruption PMRs during the 
> last two years with TDP Oracle. Yes, at first the customer got blamed 
> for not adhering completely to to blueprints, but after some weeks it 
> boild down to  ...
> silence.
> Data corruption was because of what ended in IT28096 - now fixed.
> Performance is interesting, but resembles to what you have written. We 
> work with MAXPIECESIZE settings on RMAN to keep the backup pieces 
> small and got some interesting values, pending further observation, 
> but we might be on a cheerful way. I'm talking about database sizes of 
> 50TB here, warehouse style.
>   In between we moved the big DBs to a dedicated server to prove that 
> the performance drop is because of the big DBs, and the remaining 
> "small" DBs  - size of 500MB up to 5TB - didn't put any measurable 
> stress on the DB in terms of expiration and protect stgpool. Even the 
> big DBs on their dedicated server performed better in terms of 
> expiration and protect stgpool, which might have been a coincidence of 
> these DBs holding nearly the same data and having the same retention period.
>
> What I can't observe is a slowness of the DB. Queries are answered in 
> the normal time - depending on the query. a count(*) from 
> backupobjects naturally takes some time, considerably longer when you 
> use dedup, but the daily queries are answered in the "normal" timeframe.
>
> What helped immediately was some tuning:
> - More LUNS and filesystems for the TSM-DB
> - smaller disks, but more of them, for each filesystem.
>   changing the disks from 100GB to 2 x 50GB for each DB-filesystem got 
> me a performance boost of 200% in expiration and backup db. 
> Unbelievable, but true.
> Yes, I'm using SSD. And SVC. And multiple storage systems. Performance 
> 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Michael Prix
Hello Eric,

  could you describe a bit you server setup?
What type of data are you storing? SQL (which), VM, File, ...
Is everything going into one dedup pool or are they split for different types
of data?

I assume your TSM-servers run on AIX.

In general, performant backup and restore is no problem with deduplicated
storagepools, if done the right way.
Housekeeping is a total different pair of shoes.

--
Michael Prix

On Fri, 2019-07-19 at 13:35 +, Loon, Eric van (ITOP NS) - KLM wrote:
> Hi Rick and others!
>
> I replicated the data of the test TDP client to multiple servers, running
> 7.1.7, 7.1.9 and even 8.1.8: the performance sucks on all servers.
> We do not use client replication as part of our server protection. We need a
> real time replication over the datacenter and thus we rely on host-based
> replication for all our servers. I tested with and without replication:
> there is no noticeable difference.
> As a work-around I will install a non-deduplicated file stgpool on the worst
> performing server next week. I expect the server to perform better as soon
> as all container pool objects are expired, in about 3 weeks from now.
> In the meantime I will keep on pursuing IBM until it is fixed or else we
> might need to replace the product altogether...
>
> Kind regards,
> Eric van Loon
> Air France/KLM Storage & Backup
>
> -Original Message-
> From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of
> Rick Adamson
> Sent: vrijdag 19 juli 2019 14:45
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: deletion performance of large deduplicated files
>
> Eric, Michael,
> I have been working through similar struggles and as I read your posts had
> to wonder, can you provide some details on your server and client versions?
> Basically I now have experience/exposure to every version/maintenance
> pack/patch SP has put out since 7.1.3.x
>
> My servers are now running on  8.1.8 (windows) and has seemed to have
> stabilized (for now).
> One thing that was causing me a lot of grief was using directory storage
> pools with client side deduplication enabled, particularly on data
> protection products (all of them).
> Afterwards there was a lot of cleanup; auditing containers, confirming that
> protect storage was completing successfully, and finally doing the same for
> replicate node processes.
>
> I found that understandably server performance takes a severe nose dive if
> you are trying to process (protect/replicate) damaged containers, and most
> likely restores will be compromised as well.
>
>
> Thank you,
> -Rick Adamson
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Michael
> Prix
> Sent: Friday, July 19, 2019 6:11 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] deletion performance of large deduplicated files
>
> * This email originated outside of the organization. Use caution when
> opening attachments or clicking links. *
>
> --
> Hello Eric,
>
>
>
>   welcome to my nightmares. Take a seat, wanna have a drink?
>
>
>
> I had the pleasure of performance and data corruption PMRs during the last
> two
>
> years with TDP Oracle. Yes, at first the customer got blamed for not
> adhering
>
> completely to to blueprints, but after some weeks it boild down to  ...
>
> silence.
>
> Data corruption was because of what ended in IT28096 - now fixed.
>
> Performance is interesting, but resembles to what you have written. We work
>
> with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got
>
> some interesting values, pending further observation, but we might be on a
>
> cheerful way. I'm talking about database sizes of 50TB here, warehouse
> style.
>
>   In between we moved the big DBs to a dedicated server to prove that the
>
> performance drop is because of the big DBs, and the remaining "small" DBs  -
>
> size of 500MB up to 5TB - didn't put any measurable stress on the DB in
> terms
>
> of expiration and protect stgpool. Even the big DBs on their dedicated
> server
>
> performed better in terms of expiration and protect stgpool, which might
> have
>
> been a coincidence of these DBs holding nearly the same data and having the
>
> same retention period.
>
>
>
> What I can't observe is a slowness of the DB. Queries are answered in the
>
> normal time - depending on the query. a count(*) from backupobjects
> naturally
>
> takes some time, considerably longer when you use dedup, but the daily
> queries
>
> are answered in the "normal" timeframe.
>
>
>
> What helped immediately was some tuning:
>
> - More LUNS and filesystems for the TSM-DB
>
> - smaller disks, but more of them, for each filesystem.
>
>   changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a
>
> performance boost of 200% in expiration and backup db. Unbelievable, but
> true.
>
> Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't
>
> the problem, we are measuring 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Michael Prix
Chris,

  yours is easy to answer: You have a performance problem, not a TSM problem.
For VM-backup I have a dedicated LPAR in a S842, 3vCPU, 128GB RAM, SR-IOV
10GB. Storage is a V7000, SSD/SAS. There has never been a backup or restore
problem performance wise, everything is running with wire-speed.
 The only problem we hsd was client side dedup, II28096 hit us bad, 5 days of
audit for each TSM-instance over and over again until the reason was
identified.

For a start: keep the size of hdisk small for the db. Better 2 x 50GB that 1
100GB. chfs -e x and reorgvg is your friend. mount option "rbrw,noatime" is of
great use for db, log, arc and stg pools. Check you queue_depth and set it to
max.

As for SQL: use as many streams as the client can handle. CPU is there to
handle load, not to waste energy by idleing.

--
Michael Prix

On Fri, 2019-07-19 at 13:02 +, Kizzire, Chris wrote:
> Alas... we are not the only ones I knew it...
> We went from TSM 6.3 to SP 8.1.4.0. Performance is 5 times slower in the new
> Environment overall. We use Container Pools, Dedup, & Compression. We use SP
> for VE for most VM's. Baclient & SQL for physical machines & vm's w/ SQL.
> It took about 17 hours to restore 4.5TB SQL DB the other day.
> Main Server:
>   IBM Power 750 running AIX 7.2 .
>   3.1TB DB is on SSD.
>   128GB RAM
> Server at DR site
>   IBM Power 770 running AIX 7.2
>   3.1TB DB NOT on SSD
>   100ish GB RAM
> Container Pool is on 1.5 year old IBM V5030
> w/ Identical V5030 at DR site.
>
> IBM says open a Performance PMR for AIX- which we have yet to do.
> Protect Stgpool runs for days & we have to cancel because we get too far
> behind on Replication. If we are lucky we might can replicate 12TB in a 24
> hour period w/ 100 sessions (maxsessions=50)
>
>
> Chris Kizzire
> Backup Administrator (Network Engineer II)
>
> BROOKWOOD BAPTIST HEALTH
> Information Systems
> O:   205.820.5973
>
> chris.kizz...@bhsala.com
> BROOKWOODBAPTISTHEALTH.COM
>
>
> -Original Message-
> From: ADSM: Dist Stor Manager  On Behalf Of Michael
> Prix
> Sent: Friday, July 19, 2019 5:11 AM
> To: ADSM-L@VM.MARIST.EDU
> Subject: Re: [ADSM-L] deletion performance of large deduplicated files
>
>
> CAUTION: ***EXTERNAL EMAIL*** Do NOT click links or open attachments unless
> you recognize the sender and know the content is safe. If you are unsure,
> please use PhishAlarm to report suspicious emails.
>
> Hello Eric,
>
>   welcome to my nightmares. Take a seat, wanna have a drink?
>
> I had the pleasure of performance and data corruption PMRs during the last
> two years with TDP Oracle. Yes, at first the customer got blamed for not
> adhering completely to to blueprints, but after some weeks it boild down
> to  ...
> silence.
> Data corruption was because of what ended in IT28096 - now fixed.
> Performance is interesting, but resembles to what you have written. We work
> with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got
> some interesting values, pending further observation, but we might be on a
> cheerful way. I'm talking about database sizes of 50TB here, warehouse
> style.
>   In between we moved the big DBs to a dedicated server to prove that the
> performance drop is because of the big DBs, and the remaining "small" DBs  -
> size of 500MB up to 5TB - didn't put any measurable stress on the DB in
> terms of expiration and protect stgpool. Even the big DBs on their dedicated
> server performed better in terms of expiration and protect stgpool, which
> might have been a coincidence of these DBs holding nearly the same data and
> having the same retention period.
>
> What I can't observe is a slowness of the DB. Queries are answered in the
> normal time - depending on the query. a count(*) from backupobjects
> naturally takes some time, considerably longer when you use dedup, but the
> daily queries are answered in the "normal" timeframe.
>
> What helped immediately was some tuning:
> - More LUNS and filesystems for the TSM-DB
> - smaller disks, but more of them, for each filesystem.
>   changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a
> performance boost of 200% in expiration and backup db. Unbelievable, but
> true.
> Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't
> the problem, we are measuring 2ms respone time for write AND read.
> - stripeset for each fileset
>
>
> --
> Michael Prix
>
> On Fri, 2019-07-19 at 07:29 +, Loon, Eric van (ITOP NS) - KLM wrote:
> > Hi TSM/SP-ers,
> >
> > We are struggling with the performance of our TSM servers for months
> > now. We are running several servers with hardware (Data Domain) dedup
> > for years without any problems, but on our new servers with directory
> > container pools performance is really, really bad.
> > The servers and storage are designed according to the Blueprints and
> > they are working fine as long as you do not add large database (Oracle
> > and SAP) client to them. As 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Loon, Eric van (ITOP NS) - KLM
Hi Rick and others!

I replicated the data of the test TDP client to multiple servers, running 
7.1.7, 7.1.9 and even 8.1.8: the performance sucks on all servers.
We do not use client replication as part of our server protection. We need a 
real time replication over the datacenter and thus we rely on host-based 
replication for all our servers. I tested with and without replication: there 
is no noticeable difference.
As a work-around I will install a non-deduplicated file stgpool on the worst 
performing server next week. I expect the server to perform better as soon as 
all container pool objects are expired, in about 3 weeks from now.
In the meantime I will keep on pursuing IBM until it is fixed or else we might 
need to replace the product altogether...

Kind regards,
Eric van Loon
Air France/KLM Storage & Backup

-Original Message-
From: ADSM: Dist Stor Manager [mailto:ADSM-L@VM.MARIST.EDU] On Behalf Of Rick 
Adamson
Sent: vrijdag 19 juli 2019 14:45
To: ADSM-L@VM.MARIST.EDU
Subject: Re: deletion performance of large deduplicated files

Eric, Michael,
I have been working through similar struggles and as I read your posts had to 
wonder, can you provide some details on your server and client versions?
Basically I now have experience/exposure to every version/maintenance 
pack/patch SP has put out since 7.1.3.x

My servers are now running on  8.1.8 (windows) and has seemed to have 
stabilized (for now).
One thing that was causing me a lot of grief was using directory storage pools 
with client side deduplication enabled, particularly on data protection 
products (all of them).
Afterwards there was a lot of cleanup; auditing containers, confirming that 
protect storage was completing successfully, and finally doing the same for 
replicate node processes.

I found that understandably server performance takes a severe nose dive if you 
are trying to process (protect/replicate) damaged containers, and most likely 
restores will be compromised as well.


Thank you,
-Rick Adamson


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Friday, July 19, 2019 6:11 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] deletion performance of large deduplicated files

* This email originated outside of the organization. Use caution when opening 
attachments or clicking links. *

--
Hello Eric,



  welcome to my nightmares. Take a seat, wanna have a drink?



I had the pleasure of performance and data corruption PMRs during the last two

years with TDP Oracle. Yes, at first the customer got blamed for not adhering

completely to to blueprints, but after some weeks it boild down to  ...

silence.

Data corruption was because of what ended in IT28096 - now fixed.

Performance is interesting, but resembles to what you have written. We work

with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got

some interesting values, pending further observation, but we might be on a

cheerful way. I'm talking about database sizes of 50TB here, warehouse style.

  In between we moved the big DBs to a dedicated server to prove that the

performance drop is because of the big DBs, and the remaining "small" DBs  -

size of 500MB up to 5TB - didn't put any measurable stress on the DB in terms

of expiration and protect stgpool. Even the big DBs on their dedicated server

performed better in terms of expiration and protect stgpool, which might have

been a coincidence of these DBs holding nearly the same data and having the

same retention period.



What I can't observe is a slowness of the DB. Queries are answered in the

normal time - depending on the query. a count(*) from backupobjects naturally

takes some time, considerably longer when you use dedup, but the daily queries

are answered in the "normal" timeframe.



What helped immediately was some tuning:

- More LUNS and filesystems for the TSM-DB

- smaller disks, but more of them, for each filesystem.

  changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a

performance boost of 200% in expiration and backup db. Unbelievable, but true.

Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't

the problem, we are measuring 2ms respone time for write AND read.

- stripeset for each fileset





--

Michael Prix



On Fri, 2019-07-19 at 07:29 +, Loon, Eric van (ITOP NS) - KLM wrote:

> Hi TSM/SP-ers,

>

> We are struggling with the performance of our TSM servers for months now. We

> are running several servers with hardware (Data Domain) dedup for years

> without any problems, but on our new servers with directory container pools

> performance is really, really bad.

> The servers and storage are designed according to the Blueprints and they

> are working fine as long as you do not add large database (Oracle and SAP)

> client to them. As soon as you do, the overall server performance becomes

> very 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Kizzire, Chris
Alas... we are not the only ones I knew it...
We went from TSM 6.3 to SP 8.1.4.0. Performance is 5 times slower in the new 
Environment overall. We use Container Pools, Dedup, & Compression. We use SP 
for VE for most VM's. Baclient & SQL for physical machines & vm's w/ SQL.
It took about 17 hours to restore 4.5TB SQL DB the other day.
Main Server:
IBM Power 750 running AIX 7.2 .
3.1TB DB is on SSD.
128GB RAM
Server at DR site
IBM Power 770 running AIX 7.2
3.1TB DB NOT on SSD
100ish GB RAM
Container Pool is on 1.5 year old IBM V5030
w/ Identical V5030 at DR site.

IBM says open a Performance PMR for AIX- which we have yet to do.
Protect Stgpool runs for days & we have to cancel because we get too far behind 
on Replication. If we are lucky we might can replicate 12TB in a 24 hour period 
w/ 100 sessions (maxsessions=50)


Chris Kizzire
Backup Administrator (Network Engineer II)

BROOKWOOD BAPTIST HEALTH
Information Systems
O:   205.820.5973

chris.kizz...@bhsala.com
BROOKWOODBAPTISTHEALTH.COM


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Friday, July 19, 2019 5:11 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] deletion performance of large deduplicated files


CAUTION: ***EXTERNAL EMAIL*** Do NOT click links or open attachments unless you 
recognize the sender and know the content is safe. If you are unsure, please 
use PhishAlarm to report suspicious emails.

Hello Eric,

  welcome to my nightmares. Take a seat, wanna have a drink?

I had the pleasure of performance and data corruption PMRs during the last two 
years with TDP Oracle. Yes, at first the customer got blamed for not adhering 
completely to to blueprints, but after some weeks it boild down to  ...
silence.
Data corruption was because of what ended in IT28096 - now fixed.
Performance is interesting, but resembles to what you have written. We work 
with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got some 
interesting values, pending further observation, but we might be on a cheerful 
way. I'm talking about database sizes of 50TB here, warehouse style.
  In between we moved the big DBs to a dedicated server to prove that the 
performance drop is because of the big DBs, and the remaining "small" DBs  - 
size of 500MB up to 5TB - didn't put any measurable stress on the DB in terms 
of expiration and protect stgpool. Even the big DBs on their dedicated server 
performed better in terms of expiration and protect stgpool, which might have 
been a coincidence of these DBs holding nearly the same data and having the 
same retention period.

What I can't observe is a slowness of the DB. Queries are answered in the 
normal time - depending on the query. a count(*) from backupobjects naturally 
takes some time, considerably longer when you use dedup, but the daily queries 
are answered in the "normal" timeframe.

What helped immediately was some tuning:
- More LUNS and filesystems for the TSM-DB
- smaller disks, but more of them, for each filesystem.
  changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a 
performance boost of 200% in expiration and backup db. Unbelievable, but true.
Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't 
the problem, we are measuring 2ms respone time for write AND read.
- stripeset for each fileset


--
Michael Prix

On Fri, 2019-07-19 at 07:29 +, Loon, Eric van (ITOP NS) - KLM wrote:
> Hi TSM/SP-ers,
>
> We are struggling with the performance of our TSM servers for months 
> now. We are running several servers with hardware (Data Domain) dedup 
> for years without any problems, but on our new servers with directory 
> container pools performance is really, really bad.
> The servers and storage are designed according to the Blueprints and 
> they are working fine as long as you do not add large database (Oracle 
> and SAP) client to them. As soon as you do, the overall server 
> performance becomes very bad: client and admin session initiation 
> takes 20 to 40 seconds, SQL queries run for minutes where they should 
> take a few seconds and q query stgpool sometimes takes more than a minute to 
> respond!
> I have two cases open for this. In one case we focused a lot on the OS 
> and disk performance, but during that process I noticed that the 
> performance is most likely caused by the way TSM processes large (few 
> hundred MB) files. I performed a large amount of tests and came to the 
> conclusion that it takes TSM a huge amount of time to delete large 
> deduplicated files, both in container pools as deduplicated file 
> pools. As test I use an TDP for Oracle client which uses a backup 
> piece size of 900 MB. The client contains about
> 5000 files. Deleting the files from a container pool takes more than 
> an hour. When you run a delete object for the files individually I see 
> that most files take more than a second(!) to delete. If I put that 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Rick Adamson
Eric, Michael,
I have been working through similar struggles and as I read your posts had to 
wonder, can you provide some details on your server and client versions?
Basically I now have experience/exposure to every version/maintenance 
pack/patch SP has put out since 7.1.3.x

My servers are now running on  8.1.8 (windows) and has seemed to have 
stabilized (for now).
One thing that was causing me a lot of grief was using directory storage pools 
with client side deduplication enabled, particularly on data protection 
products (all of them).
Afterwards there was a lot of cleanup; auditing containers, confirming that 
protect storage was completing successfully, and finally doing the same for 
replicate node processes.

I found that understandably server performance takes a severe nose dive if you 
are trying to process (protect/replicate) damaged containers, and most likely 
restores will be compromised as well.


Thank you,
-Rick Adamson


-Original Message-
From: ADSM: Dist Stor Manager  On Behalf Of Michael Prix
Sent: Friday, July 19, 2019 6:11 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: [ADSM-L] deletion performance of large deduplicated files

* This email originated outside of the organization. Use caution when opening 
attachments or clicking links. *

--
Hello Eric,



  welcome to my nightmares. Take a seat, wanna have a drink?



I had the pleasure of performance and data corruption PMRs during the last two

years with TDP Oracle. Yes, at first the customer got blamed for not adhering

completely to to blueprints, but after some weeks it boild down to  ...

silence.

Data corruption was because of what ended in IT28096 - now fixed.

Performance is interesting, but resembles to what you have written. We work

with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got

some interesting values, pending further observation, but we might be on a

cheerful way. I'm talking about database sizes of 50TB here, warehouse style.

  In between we moved the big DBs to a dedicated server to prove that the

performance drop is because of the big DBs, and the remaining "small" DBs  -

size of 500MB up to 5TB - didn't put any measurable stress on the DB in terms

of expiration and protect stgpool. Even the big DBs on their dedicated server

performed better in terms of expiration and protect stgpool, which might have

been a coincidence of these DBs holding nearly the same data and having the

same retention period.



What I can't observe is a slowness of the DB. Queries are answered in the

normal time - depending on the query. a count(*) from backupobjects naturally

takes some time, considerably longer when you use dedup, but the daily queries

are answered in the "normal" timeframe.



What helped immediately was some tuning:

- More LUNS and filesystems for the TSM-DB

- smaller disks, but more of them, for each filesystem.

  changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a

performance boost of 200% in expiration and backup db. Unbelievable, but true.

Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't

the problem, we are measuring 2ms respone time for write AND read.

- stripeset for each fileset





--

Michael Prix



On Fri, 2019-07-19 at 07:29 +, Loon, Eric van (ITOP NS) - KLM wrote:

> Hi TSM/SP-ers,

>

> We are struggling with the performance of our TSM servers for months now. We

> are running several servers with hardware (Data Domain) dedup for years

> without any problems, but on our new servers with directory container pools

> performance is really, really bad.

> The servers and storage are designed according to the Blueprints and they

> are working fine as long as you do not add large database (Oracle and SAP)

> client to them. As soon as you do, the overall server performance becomes

> very bad: client and admin session initiation takes 20 to 40 seconds, SQL

> queries run for minutes where they should take a few seconds and q query

> stgpool sometimes takes more than a minute to respond!

> I have two cases open for this. In one case we focused a lot on the OS and

> disk performance, but during that process I noticed that the performance is

> most likely caused by the way TSM processes large (few hundred MB) files. I

> performed a large amount of tests and came to the conclusion that it takes

> TSM a huge amount of time to delete large deduplicated files, both in

> container pools as deduplicated file pools. As test I use an TDP for Oracle

> client which uses a backup piece size of 900 MB. The client contains about

> 5000 files. Deleting the files from a container pool takes more than an

> hour. When you run a delete object for the files individually I see that

> most files take more than a second(!) to delete. If I put that same data in

> a non-deduplicated file pool, a delete filespace takes about 15 seconds...

> The main 

Re: deletion performance of large deduplicated files

2019-07-19 Thread Michael Prix
Hello Eric,

  welcome to my nightmares. Take a seat, wanna have a drink?

I had the pleasure of performance and data corruption PMRs during the last two
years with TDP Oracle. Yes, at first the customer got blamed for not adhering
completely to to blueprints, but after some weeks it boild down to  ...
silence.
Data corruption was because of what ended in IT28096 - now fixed.
Performance is interesting, but resembles to what you have written. We work
with MAXPIECESIZE settings on RMAN to keep the backup pieces small and got
some interesting values, pending further observation, but we might be on a
cheerful way. I'm talking about database sizes of 50TB here, warehouse style.
  In between we moved the big DBs to a dedicated server to prove that the
performance drop is because of the big DBs, and the remaining "small" DBs  -
size of 500MB up to 5TB - didn't put any measurable stress on the DB in terms
of expiration and protect stgpool. Even the big DBs on their dedicated server
performed better in terms of expiration and protect stgpool, which might have
been a coincidence of these DBs holding nearly the same data and having the
same retention period.

What I can't observe is a slowness of the DB. Queries are answered in the
normal time - depending on the query. a count(*) from backupobjects naturally
takes some time, considerably longer when you use dedup, but the daily queries
are answered in the "normal" timeframe.

What helped immediately was some tuning:
- More LUNS and filesystems for the TSM-DB
- smaller disks, but more of them, for each filesystem.
  changing the disks from 100GB to 2 x 50GB for each DB-filesystem got me a
performance boost of 200% in expiration and backup db. Unbelievable, but true.
Yes, I'm using SSD. And SVC. And multiple storage systems. Performance isn't
the problem, we are measuring 2ms respone time for write AND read.
- stripeset for each fileset


--
Michael Prix

On Fri, 2019-07-19 at 07:29 +, Loon, Eric van (ITOP NS) - KLM wrote:
> Hi TSM/SP-ers,
>
> We are struggling with the performance of our TSM servers for months now. We
> are running several servers with hardware (Data Domain) dedup for years
> without any problems, but on our new servers with directory container pools
> performance is really, really bad.
> The servers and storage are designed according to the Blueprints and they
> are working fine as long as you do not add large database (Oracle and SAP)
> client to them. As soon as you do, the overall server performance becomes
> very bad: client and admin session initiation takes 20 to 40 seconds, SQL
> queries run for minutes where they should take a few seconds and q query
> stgpool sometimes takes more than a minute to respond!
> I have two cases open for this. In one case we focused a lot on the OS and
> disk performance, but during that process I noticed that the performance is
> most likely caused by the way TSM processes large (few hundred MB) files. I
> performed a large amount of tests and came to the conclusion that it takes
> TSM a huge amount of time to delete large deduplicated files, both in
> container pools as deduplicated file pools. As test I use an TDP for Oracle
> client which uses a backup piece size of 900 MB. The client contains about
> 5000 files. Deleting the files from a container pool takes more than an
> hour. When you run a delete object for the files individually I see that
> most files take more than a second(!) to delete. If I put that same data in
> a non-deduplicated file pool, a delete filespace takes about 15 seconds...
> The main issue is that the TDP clients are doing the exact same thing: as
> soon as a backup file is no longer needed, it's removed from the RMAN
> catalog and deleted from TSM. Since we have several huge database clients
> (multiple TB's each) these Oracle delete jobs tend to run for hours. These
> delete jobs also seem to slow down each other, because when there are
> several of those jobs running at the same time, they become even more slow.
> At this point I have one server where these jobs are running 24 hours per
> day! This server is at the moment the worst performing TSM server I have
> ever seen. On the other container pool servers I was able to move the Oracle
> and SAP server away to the old servers (the ones with the Data Domain), but
> on this one I can't because of Data Domain capacity reasons.
> For this file deletion performance I also have a case open, but there is
> absolutely no progress. I proved IBM how bad the performance is and I even
> offered them a copy of our database so they can see for themselves, but only
> silence from development...
> One thing I do not understand: I find it very hard to believe that we are
> the only one suffering from this issue. There must be dozens of TSM users
> out there that backup large databases to TSM container pools?
>
> Kind regards,
> Eric van Loon
> Air France/KLM Storage & Backup
> 
> For 

deletion performance of large deduplicated files

2019-07-19 Thread Loon, Eric van (ITOP NS) - KLM
Hi TSM/SP-ers,

We are struggling with the performance of our TSM servers for months now. We 
are running several servers with hardware (Data Domain) dedup for years without 
any problems, but on our new servers with directory container pools performance 
is really, really bad.
The servers and storage are designed according to the Blueprints and they are 
working fine as long as you do not add large database (Oracle and SAP) client 
to them. As soon as you do, the overall server performance becomes very bad: 
client and admin session initiation takes 20 to 40 seconds, SQL queries run for 
minutes where they should take a few seconds and q query stgpool sometimes 
takes more than a minute to respond!
I have two cases open for this. In one case we focused a lot on the OS and disk 
performance, but during that process I noticed that the performance is most 
likely caused by the way TSM processes large (few hundred MB) files. I 
performed a large amount of tests and came to the conclusion that it takes TSM 
a huge amount of time to delete large deduplicated files, both in container 
pools as deduplicated file pools. As test I use an TDP for Oracle client which 
uses a backup piece size of 900 MB. The client contains about 5000 files. 
Deleting the files from a container pool takes more than an hour. When you run 
a delete object for the files individually I see that most files take more than 
a second(!) to delete. If I put that same data in a non-deduplicated file pool, 
a delete filespace takes about 15 seconds...
The main issue is that the TDP clients are doing the exact same thing: as soon 
as a backup file is no longer needed, it's removed from the RMAN catalog and 
deleted from TSM. Since we have several huge database clients (multiple TB's 
each) these Oracle delete jobs tend to run for hours. These delete jobs also 
seem to slow down each other, because when there are several of those jobs 
running at the same time, they become even more slow. At this point I have one 
server where these jobs are running 24 hours per day! This server is at the 
moment the worst performing TSM server I have ever seen. On the other container 
pool servers I was able to move the Oracle and SAP server away to the old 
servers (the ones with the Data Domain), but on this one I can't because of 
Data Domain capacity reasons.
For this file deletion performance I also have a case open, but there is 
absolutely no progress. I proved IBM how bad the performance is and I even 
offered them a copy of our database so they can see for themselves, but only 
silence from development...
One thing I do not understand: I find it very hard to believe that we are the 
only one suffering from this issue. There must be dozens of TSM users out there 
that backup large databases to TSM container pools?

Kind regards,
Eric van Loon
Air France/KLM Storage & Backup

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286