[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2022-04-04 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516858#comment-17516858
 ] 

Steve Loughran commented on HADOOP-16415:
-

turns out the fix for this is an M1 MBP

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2021-01-04 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258420#comment-17258420
 ] 

Steve Loughran commented on HADOOP-16415:
-

v1 list API calls are 100s. Do we need these? Better: a minimal set of tests
 102.663 s - in org.apache.hadoop.fs.s3a.ITestS3AContractGetFileStatusV1List

ITestS3ARemoteFileChanged is 800s, because it is so parameterized, including on 
change detection policy on open streams.

Not all tests change behaviour on those options, especially the rename ones. 
Better: split into tests which read file data, and tests which just manipulate 
files.

With S3Guard off, we should still need to test what happens when a file is 
changed while open. We shouldn't need to worry about mismatch between listing 
and opened/renamed files.

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2021-01-04 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258415#comment-17258415
 ] 

Steve Loughran commented on HADOOP-16415:
-

h3. Huge tests

we have too many of the Huge upload tests, one for each buffer mechanism.

Proposed: 
* only test disk buffering
* make sure we have unit tests for the others for large buffers which verify we 
can mark/reset back to the beginning, which is what the aws sdk needs

h3. Surprisingly slow

85.901 s - in org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextURI  -- 
too many exists/isFile/isDir checks. Best to only do isDir

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887449#comment-16887449
 ] 

Steve Loughran commented on HADOOP-16415:
-

The {{ITestAssumedRoleCommitOperations}} is pretty slow and it just runs 
ITestCommitOperations under an assumed role. 
* we could make this the default behaviour if he assumed role option is set. 

The commit protocol tests are all critical, maybe we could combine some tests 
into something longer. Risk of higher maintenance tho'.

ITestS3GuardConcurrentOps should be converted to a scale test

The 60s for the SSE-C encryption test {{ITestS3AEncryptionSSEC}} is 
particularly painful, given SSE-C isn't used much, and it adds 60s to a test 
run. It doesn't to much, but we can't parallelize it or it screws up everything 
else. Maybe we could merge the tests, but if there's an overhead with every 
test case, it has to be create/destroy of every S3A instance.

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887446#comment-16887446
 ] 

Steve Loughran commented on HADOOP-16415:
-

Test run timings with HADOOP-16422 to shave 3 min off:
-Dparallel-tests -DtestsThreadCount=12 -Ds3guard -Ddynamo

Interesting to see how the parallel phase has got more tests in: the remote 
file changed, S3 Select and delegation token MR jobs are all new since Hadoop 
3.3. 

{code}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractSeek
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractRename
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractUnbuffer
[INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
[INFO] Running 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractMkdir
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractOpen
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractGetFileStatus
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractDelete
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 20.064 s 
- in org.apache.hadoop.fs.contract.s3a.ITestS3AContractUnbuffer
[INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
[WARNING] Tests run: 15, Failures: 0, Errors: 0, Skipped: 15, Time elapsed: 
23.067 s - in 
org.apache.hadoop.fs.contract.s3a.ITestS3AContractMultipartUploader
[INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.1 s - 
in org.apache.hadoop.fs.contract.s3a.ITestS3AContractDelete
[INFO] Running org.apache.hadoop.fs.s3a.commit.magic.ITestMagicCommitProtocol
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.853 
s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractOpen
[INFO] Running 
org.apache.hadoop.fs.s3a.commit.integration.ITestS3ACommitterMRJob
[WARNING] Tests run: 11, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 
32.889 s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractCreate
[INFO] Running org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
[INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.578 s 
- in org.apache.hadoop.fs.contract.s3a.ITestS3AContractMkdir
[INFO] Running org.apache.hadoop.fs.s3a.commit.ITestCommitOperations
[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.284 
s - in org.apache.hadoop.fs.s3a.ITestS3ATemporaryCredentials
[INFO] Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.147 s 
- in org.apache.hadoop.fs.contract.s3a.ITestS3AContractRename
[INFO] Running org.apache.hadoop.fs.s3a.impl.ITestPartialRenamesDeletes
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.821 s 
- in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory
[INFO] Running org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextURI
[INFO] Running org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContext
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.135 s 
- in org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContext
[INFO] Running org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextUtil
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.89 s 
- in org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextUtil
[INFO] Running 
org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextCreateMkdir
[INFO] Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.953 
s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractGetFileStatus
[INFO] Running 
org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextMainOperations
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.936 
s - in org.apache.hadoop.fs.s3a.fileContext.ITestS3AFileContextCreateMkdir
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3GuardTtl
[INFO] Tests run: 72, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 108.264 
s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractSeek
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3GuardCreate
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.302 s 
- in org.apache.hadoop.fs.s3a.ITestS3GuardCreate
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3AEncryptionSSES3
[INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.607 
s - in 

[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-12 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883813#comment-16883813
 ] 

Steve Loughran commented on HADOOP-16415:
-

Thinking for of this

We're running the same operation (MR or terasort sequence) with different 
cluster configs, which is exactly what parameterized test runs can do. So we 
just need some parameterization which declares everything a specific test run 
can do: 
* config options
* extra callbacks on validation
* expected outcomes

then we have a test which brings up the mini yarn cluster in static setup, 
destroys it in teardown, and has the sets run parameterized

The only thing we'd need to do is implement a significantly more complex 
parameterization than normal, with each one being a class declaring all that is 
needed for each one. Ideally, one which we could share between the Terasort and 
the TestMRJob tests



> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-10 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881996#comment-16881996
 ] 

Steve Loughran commented on HADOOP-16415:
-

All the tests of S3A committers are slow because they spin up a yarn cluster 
each (slow) then the actual MR jobs

Proposal: rework so that the miniyarn cluster comes up once, with each job 
running as its own test suite. This is complex for the Terasort tests  (only 
run on scale) as they are implemented as ordered set of test cases; we'd need 
to copy the base suite and rework to run terasort for dir and magic committers 
in sequence

The ITest*CommitProtocol suites are slow too; 300-400+ seconds each
 * do we need the staging one?
 * what can we do for better parallelism here? Even if its just faster creation 
of temp files, deletion etc


> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881473#comment-16881473
 ] 

Steve Loughran commented on HADOOP-16415:
-

and of course we can parallelise existing tests better by splitting up single 
large test suites into smaller ones; with the FcContext one being the obvious 
target. But: if we improve recycling of s3a FS instances across a single test 
suite, we'd actually get more benefit from the larger suites

See: HADOOP-13330 for what we can do w.r.t delete speedup. We can do the 
s3guard updates incrementally and internally, parallel issuing of batch 
updates. For deleting 1000 files, DynamoDB, and hence S3guard, becomes the 
bottleneck. 

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881454#comment-16881454
 ] 

Steve Loughran commented on HADOOP-16415:
-

tests which create data should do it in parallel where possible.

Note that the printed test times of junit reports seem to measure execution 
time of the test case, not that of setup/teardown, so will underestimate delays

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16415) Speed up S3A test runs

2019-07-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881446#comment-16881446
 ] 

Steve Loughran commented on HADOOP-16415:
-

other idea: make S3 + S3guard faster :)

> Speed up S3A test runs
> --
>
> Key: HADOOP-16415
> URL: https://issues.apache.org/jira/browse/HADOOP-16415
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A Test runs are way too slow.
> Speed them by
> * reducing test setup/teardown costs
> * eliminating obsolete test cases
> * merge small tests into larger ones.
> One thing i see is that the main S3A test cases create and destroy new FS 
> instances; There's both a setup and teardown cost there, but it does 
> guarantee better isolation.
> Maybe if we know all test cases in a specific suite need the same options, we 
> can manage that better; demand create the FS but only delete it in an 
> @Afterclass method. That'd give us the OO-inheritance based setup of tests, 
> but mean only one instance is done per suite



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org