[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734141#comment-16734141 ] Gabor Bota commented on HADOOP-15819: - Created HADOOP-16027 to add this to docs. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733557#comment-16733557 ] Steve Loughran commented on HADOOP-15819: - I think we should write something down about "effective use of FS instances", so yes, a new JIRA please I wonder what we should say # tests are fastest if they can recycle the existing FS instance from the same JVM # if you do that, you MUST NOT close them. # if you want a guarantee of 100% isolation, or an instance with unique config, create a new instance # which you MUST close in teardown to avoid leakage of thread pools + what you've proposed "do not add...", with the other point being "only do this if you are trying to get unmodified code to pick up your custom instance, such as when playing with mocks" > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733167#comment-16733167 ] Gabor Bota commented on HADOOP-15819: - We could extend the docs with something like "Do not add FileSystem instances to the cache that will be modified during the test runs with e.g org.apache.hadoop.fs.FileSystem#addFileSystemForTesting. This can cause other tests to fail when using the same modified or closed FS instance. For more details see HADOOP-15819." Should I create a new jira for this and upload a patch? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732172#comment-16732172 ] Sean Mackrory commented on HADOOP-15819: I wouldn't have thought so as I think you have to go our of your way to do something you don't usually do to hit this, but then I don't fully understand why this was added in the first place. I think the patch originally came from you - do you happen to remember why we started adding the FS instance to the cache here? And if so, is that something that might occur to a future test writer as well? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731950#comment-16731950 ] Steve Loughran commented on HADOOP-15819: - quick follow up here: do we need to add anything to the testing s3a doc for anyone writing new tests? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729701#comment-16729701 ] Adam Antal commented on HADOOP-15819: - Thanks [~mackrorysd] for the commit. The credit goes to [~gabor.bota] who started the investigation, and also to [~ste...@apache.org] for the useful comments. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729695#comment-16729695 ] Hudson commented on HADOOP-15819: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15665 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15665/]) HADOOP-15819. FileSystem cache misused in S3A integration tests. (mackrorysd: rev d8f670ff28c1a9f49ef908a23850bcfddd4f46dc) * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729682#comment-16729682 ] Sean Mackrory commented on HADOOP-15819: Committed. Thank you for the excellent work here [~adam.antal]. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726305#comment-16726305 ] Aaron Fabbri commented on HADOOP-15819: --- Just catching up on this as I hit the bug again. Sounds like a great find [~adam.antal]. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725843#comment-16725843 ] Adam Antal commented on HADOOP-15819: - Indeed. I also saw that it is used elsewhere, so I would not recommend modifying the function by manually setting the FS.key there. In my opinion if we manually inject the FS into the cache maybe we should manually remove it (for e.g. in the teardown of the test). Although I still don't see why injecting is necessary. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474)
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725544#comment-16725544 ] Sean Mackrory commented on HADOOP-15819: Ah - FS.key not being set is the piece I was missing. I suppose the truly correct solution here would be to make sure FS.key is set properly, but I don't know that the benefit to that is worth a whole lot of your time. I'll hold off for a day in case [~ste...@apache.org] who wrote this in the first place disagrees. Otherwise +1 and I'll commit soon. As a side note, I noticed StagingTestBase has a call to the same function, but obviously isn't causing the same issue. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725168#comment-16725168 ] Adam Antal commented on HADOOP-15819: - Not exactly, to be more clear though I enumerate the events when calling any subtest of {{AbstractITCommitProtocol}}: * During the very first moment of the startup {{setup()}} and {{init()}} functions are called. Concretely {{AbstractBondedFSContract}} creates and stores a new {{FileSystem}} instance of the proper type. ** Later, each time the getFileSystem() is called, it returns this "cached" instance (This is NOT the FSCache! just a private data member of {{AbstractBondedFSContract}}). ** Note: since {{AbstractITCommitProtocol.createConfiguration()}} contains {{disableFilesystemCaching(conf)}}, the cache is disabled. Each time this Configuration object is used anywhere in the test, it returns a newly created instance by the static {{FileSystem.get()}} method. * During this startup in {{AbstractITCommitProtocol.setup()}} the line {{bindFileSystem(fileSystem, outDir, fileSystem.getConf());}} is executed as well. We call the internal {{getFileSystem()}}, which returns the already created and stored instance, and inject it directly into the FSCache. * The test runs using the stored instance and the non-cached instances of FileSystem. * During teardown we close the stored FileSystem object as usual, BUT since we didn't put this instance into the FSCache the regular way it got stuck in the cache. Although the {{close()}} contains this call: {{CACHE.remove(this.key, this);}}, the {{this.key}} (so the FileSystem's FileSystem.Cache private member) is not set. It would have been set, if it had been put to the cache in the regular way, but not by injection. More concretely {{Cache.getInternal}} sets this.key of the FileSystem object, which is surpassed when the FS instance is manually injected. * Later, in the same JVM another test is setting up using the {{s3a://}} scheme, this object gets retrieved from the cache (since it's not removed), but now it's closed, and boom. I'd rather not mess with manual injection, that's why I proposed to delete the binding. I wasn't 100% right with my previous comment (you can practically ignore that), but I think this is correct way how things work. Still I can make mistakes, so please review my thoughts on this. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR]
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724301#comment-16724301 ] Sean Mackrory commented on HADOOP-15819: The only failure I see with this patch is the bouncycastle one. I don't have an objection to removing this - it's only for a slight perf improvement on tests. I'm curious to know why this is resulting an a distinct cache entry. Is the URL slightly different at this point than it is in FS.newInstance() or something? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723105#comment-16723105 ] Adam Antal commented on HADOOP-15819: - Thanks, [~gabor.bota]. I uploaded patch [^HADOOP-15819.002.patch] (removed the unused method). > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.002.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193)
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721551#comment-16721551 ] Gabor Bota commented on HADOOP-15819: - Thanks for the patch [~adam.antal]. Not just the call, but the whole method could be removed: org.apache.hadoop.fs.s3a.commit.AbstractITCommitProtocol#bindFileSystem, because there are no other call for that method. I think adding the fs to the cache with {{FileSystemTestHelper.addFileSystemForTesting}} from {{AbstractITCommitProtocol#bindFileSystem}} is not needed, because we will use the same FileSystem object throughout the tests from {{getFileSystem().}} [~mackrorysd] do could you add some thoughts if removing the bindFileSystem is a good fix for this issue? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Adam Antal >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > S3ACloseEnforcedFileSystem.java, S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713791#comment-16713791 ] Adam Antal commented on HADOOP-15819: - I removed it. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > S3ACloseEnforcedFileSystem.java, S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at > org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40) > at
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712683#comment-16712683 ] Gabor Bota commented on HADOOP-15819: - Thanks [~adam.antal], could you remove the bad one? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.001.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711966#comment-16711966 ] Adam Antal commented on HADOOP-15819: - I'm sorry, uploaded bad patch (reuploaded the right one [^HADOOP-15819.001.patch] ). > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > HADOOP-15819.001.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711957#comment-16711957 ] Gabor Bota commented on HADOOP-15819: - Thanks [~adam.antal], I'll test it tomorrow up and downstream. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > S3ACloseEnforcedFileSystem.java, S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711917#comment-16711917 ] Adam Antal commented on HADOOP-15819: - I'd also like to add that it worked for my disabled-cache version of trunk (so I disabled cache for {{AbstractITCommitProtocol }}previously) - it was added to the patch as well. It actually works without disabling cache (so only removing the bindFileSystem call). > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, HADOOP-15819.001.patch, > S3ACloseEnforcedFileSystem.java, S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711783#comment-16711783 ] Adam Antal commented on HADOOP-15819: - Hi everyone! I uploaded patch v1 with the fix. Remarks/explanation: * After thorough investigation of the cache I came to the conclusion that it is work as expected - I did not find any abnormal behaviour. The closed S3AFS must got into the cache somehow, so I tried further logging, and it showed that it was through the {{FileSystem}}'s following function: {code:java} @VisibleForTesting static void addFileSystemForTesting(URI uri, Configuration conf, FileSystem fs) throws IOException { CACHE.map.put(new Cache.Key(uri, conf), fs); } {code} * It turned out that among the hadoop-aws integration tests there's only one usage of this function, and the stack trace is the following: {code:java} CACHE.map.put(new Cache.Key(uri, conf), fs) -> FileSystem.addFileSystemForTesting(uri, conf, fs) -> FileSystemTestHelper.addFileSystemForTesting(uri, conf, wrapperFS) -> AbstractITCommitProtocol.bindFileSystem(FileSystem fs, Path path, Configuration conf) -> AbstractITCommitProtocol.setup() {code} * (Note that there are other usage in the test suite, but for integration tests this is the only one). * This function directly injects the FS into the FSCache (which can be in the cache already, but with another key), but as I wrote it in my previous comment, the caching is explicitly disabled in {{AbstractITCommitProtocol}} since in {{AbstractITCommitProtocol.createConfiguration()}} we call {{disableFilesystemCaching(conf)}}. * So this FS is in the cache, but it is never returned by {{FileSystem.get()}} since the cache is disabled. Instead a new S3AFileSystem object is returned in each call. * During teardown the FS is closed, and it's got removed from the cache *but the only removed key-value pair is the injected one.* If there was other key-value pair that has the same FS as value, it is kept in the cache. So after this action one key with one closed FS value as pair could remain in the FSCache. * The tests are running in the JVM so the next test case which tries to access a FS with {{s3a}} scheme without disabled cache gets this closed FS, and runs into the known error. * Since the cache is disabled there's no need to explicitly use that {{bindFileSystem}} method. My easy fix is to remove it. * This function is rarely used in tests, that may explain why we got this error and why no one has ever encountered errors like this. To sum it up, the cache is working as intended, but the misuse was the root of the issue. Maybe we should consider deleting it since it cause unforeseen effects like this (note that this function has only one usage except hadoop-aws). Please verify my observations and please also check that the fix is working indeed. I also note that {{ITestS3ACommitterFactory}} is still failing, there was another error which got hidden by the "closed fs" error. Regards, Adam > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703729#comment-16703729 ] Adam Antal commented on HADOOP-15819: - Strangely I rerun the tests in trunk with no modification, and got only one error in {{ITestS3ACommitterFactory}} (same as above). {noformat} [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.017 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory [ERROR] testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) Time elapsed: 0.016 s <<< ERROR! java.io.IOException: s3a://adamantal-hdfs-s3-dev: FileSystem is closed! at org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.setup(ITestS3ACommitterFactory.java:94) {noformat} Probably this is due to the fixes in HADOOP-15947, HADOOP-15798 or HADOOP-15370. Still there is one more error, but it can be resolved with either disabling caching, or by investigating why the closed S3AFS is returned from {{FileSystem.get()}}. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15819.000.patch, S3ACloseEnforcedFileSystem.java, > S3ACloseEnforcedFileSystem.java, closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703616#comment-16703616 ] Adam Antal commented on HADOOP-15819: - Also some minor remark on {{S3ACloseEnforcedFileSystem}}: the method {{processDeleteOnExit()}} should not call the {{checkIfClosed()}} method in {{S3ACloseEnforcedFileSystem}}, because the former is only called after the {{close()}} (the order of the calls is {{S3ACloseEnforcedFileSystem:close() -> S3AFileSystem:close() -> FileSystem:close() -> S3ACloseEnforcedFileSystem:processDeleteOnExit()}}) thus creating misleading errors during the normal close of the filesystem. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703567#comment-16703567 ] Adam Antal commented on HADOOP-15819: - I investigated a bit further, and when the caching is disabled in {{AbstractS3ATestBase:createConfiguration()}} and {{ITestS3ACommitterFactory:setup()}} methods, the "Using closed FS!" exception does not appear. Note that the committer uses another config object, so it needs to be added there too. Strictly speaking there's one failing test, which is {{ITestS3AClosedFS}}, but this is due to the addition of the {{S3ACloseEnforcedFileSystem}}, which raises an exception after closing the fs, but that is exactly what we investigate, so that exception is irrelevant. If we decide later to explicitly disable the cache, I upload a patch which fixes the bug. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700891#comment-16700891 ] Adam Antal commented on HADOOP-15819: - Also tests were run in eu-ireland (eu-west-1). > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at > org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40) > at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700823#comment-16700823 ] Adam Antal commented on HADOOP-15819: - I looked through the affected tests, and maybe this comprehensive list can help to escalate this. I used [~gabor.bota]'s patch and the S3ACloseEnforcedFileSystem in it. Only the AbstractS3ATestBase's subtests are affected, and can be grouped by the following (also grouped further by common superclass): # Failing tests due to "Using closed FS!" with enabled cache ** ITestS3AInconsistency ** ITestS3AMiscOperations ** ITestS3AMiniYarnCluster ** ITestS3AEmptyDirectory ** ITestS3GuardEmptyDirs ** ITestS3ADelayedFNF ** ITestS3ACopyFromLocalFile ** ITestS3GuardListConsistency ** ITestS3AMetrics ** S3AScaleTestBase *** AbstractITestS3AMetadataStoreScale ITestLocalMetadataStoreScale ITestDynamoDBMetadataStoreScale *** ITestS3ADeleteManyFiles ITestS3ADeleteFilesOneByOne *** ITestS3AInputStreamPerformance *** ITestS3AConcurrentOps *** ITestS3ADirectoryPerformance *** ITestS3ACreatePerformance ** ITestS3GuardCreate ** ITestS3GuardWriteBack ** AbstractS3GuardToolTestBase *** ITestS3GuardToolDynamoDB *** ITestS3GuardToolLocal ** ITestAssumeRole ** ITestS3GuardConcurrentOps ** ITestS3AFileOperationCost ** ITestS3ABlocksize ** ITestS3AClosedFS ** AbstractCommitITest *** ITestS3ACommitterFactory # Tests not failing and having explicitly disabled cache ** ITestS3AMultipartUtils ** ITestS3GuardTtl ** ITestS3AFailureHandling ** AbstractSTestS3AHugeFiles *** ITestS3AHugeFilesDiskBlocks ITestS3AHugeFilesSSECDiskBlocks *** ITestS3AHugeMagicCommits *** ITestS3AHugeFilesArrayBlocks *** ITestS3AHugeFilesByteBufferBlocks ** ITestS3ABlockOutputArray *** ITestS3ABlockOutputByteBuffer *** ITestS3ABlockOutputDisk ** AbstractCommitITest *** AbstractITCommitProtocol ITestStagingCommitProtocol * ITestDirectoryCommitProtocol * ITestPartitionedCommitProtocol ITestMagicCommitProtocol # Already ignored tests in the testsuite ** ITestS3AEncryptionAlgorithmValidation # Ignored tests by me due to another stuff ** ITestS3ATemporaryCredentials: Bug in test ** AbstractCommitITest: Bug: BouncyCastle not found exception *** AbstractITCommitMRJob ITStagingCommitMRJobBadDest ITMagicCommitMRJob ITDirectoryCommitMRJob ITStagingCommitMRJob ITPartitionCommitMRJob ** AbstractTestS3AEncryption: Auth tests were not set up Apart from the ignored tests either by me or by default, the strange thing is that *tests have either failed or had explicitly disabled cache but not both*. In the list above you can see the list of tests, where the first category is where tests have failed but in the configuration the disabling cache option was not added (checked every level in the class hierarchy), and the second category where cache is explicitly disabled, but those tests have never failed during my examination. It has taken some time looking through them, but I checked every one of the tests (still I might have made a mistake) and every config having (indirect) superclass AbstractS3ATestBase. What's more interesting is that if I manually add a plus {{S3ATestUtils.disableFilesystemCaching}} call to the {{createConfiguration}} method of AbstractS3ATestBase (so not to the xml) I get only a single exception instead of the dozens above: only in {{ITestS3ACommitterFactory}}. I also add the output of the single failed test in that case: {code:java} 2018-11-27 18:55:08,036 [JUnit-testEverything] ERROR s3a.S3ACloseEnforcedFileSystem (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): java.lang.RuntimeException: Using closed FS!. at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) at org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.makeQualified(S3ACloseEnforcedFileSystem.java:115) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.initOutput(AbstractS3ACommitter.java:149) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.(AbstractS3ACommitter.java:129) at org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter.(MagicS3GuardCommitter.java:73) at org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitterFactory.createTaskCommitter(MagicS3GuardCommitterFactory.java:44) at org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory.createTaskCommitter(S3ACommitterFactory.java:81) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:48) at org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory.createCommitter(ITestS3ACommitterFactory.java:198) at
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700370#comment-16700370 ] Gabor Bota commented on HADOOP-15819: - fyi [~adam.antal] I've created an issue for the problems when file caching is disabled: HADOOP-15796 > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at > org.apache.hadoop.fs.s3a.ITestS3AClosedFS.setup(ITestS3AClosedFS.java:40) > at
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645064#comment-16645064 ] Steve Loughran commented on HADOOP-15819: - maybe ask the fs itself, e,g {code} if (fs!=null) { fs.getCacheDisabled = fs.getConfiguration.getBoolean() | {code} Anyway, I think the "allow tests to explicitly declare when they want closing" is better > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644973#comment-16644973 ] Gabor Bota commented on HADOOP-15819: - I have some bad news: I get these issues for tests where {{FS_S3A_IMPL_DISABLE_CACHE}} set to true, so the caching is disabled. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644940#comment-16644940 ] Steve Loughran commented on HADOOP-15819: - bq. The FS cache really feels inherently broken in the parallel tests case, which is why I initially liked the idea of disabling caching for the tests. parallel tests run in their own JVMs, the main issue there is that you need to be confident that tests aren't writing to the same local/remote paths. At the same time, I've seen old test instances get recycled, which makes me thing that the parallel runner fields work out to instantiated test runners as they complete individual test cases. So reuse does happen, it just happens a lot more in sequential runs. Tests which want special configs of the FS can't handle recycled classes, hence the need for nee filesystems & close after, but I don't see why other tests should be closing much. * HADOOP-13131 added the close, along with {{S3ATestUtils.createTestFilesystem()}}, which does create filesystems that need to be closed. * But I don't see that creation happening much, especially given {{S3AContract}}'s FS is just from a get(). * Many tests call {{S3ATestUtils.disableFilesystemCaching(conf)}} before FileSystem.get, which guarantees unique instances Which makes me think: yes, closing the FS in teardown is overkill except in the special case of "creates a new filesystem() either explicilty or implicitly. As [~mackrorysd] says: surprising this hasn't surfaced before. But to fix it means that it should be done properly. * IF tests were always closed and new ones created (i.e. no catching), test cases run way, way faster. * those tests which do need their own FS instance can close it in teardown, and set it up. * And those tests which absolutely must have FS.get() Return their specific filesystem must: (a) enable caching and (b) remove their FS from the cache in teardown (e.g. FileSystem.closeAll) This is probably going to force a review of all the tests, maybe have some method in AbstractS3ATestBase {code} protected boolean uniqueFilesystemInstance() { return false; } {code} then # if true, in createConfiguration() call {{disableFilesystemCaching)) # if true in teardown: close the FS. Next: * go through all uses of the {{disableFilesystemCaching)), and in those tests have {{uniqueFilesystemInstance}} return true. * Look at uses of {{S3ATestUtils.createTestFilesystem()}} & make sure they are closing this after This is going to span all the tests. Joy > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644730#comment-16644730 ] Gabor Bota commented on HADOOP-15819: - I think we should focus on solving this case first, so it would be better not to talk about parallel test cases. The FS cache itself could be broken when running parallel tests, but I think it was never prepared to for parallel test run, and that change should be addressed in another issue since it's an entirely different topic. If we don't run parallel tests we still have this problem. The issue I see right now is that we close the filesystem after each test and I was not able to find another class that does this. I think it's very specific to AbstractS3ATestBase (so S3A) and it's 33 implementations. {{org.apache.hadoop.fs.s3a.AbstractS3ATestBase#teardown}} is called after each test, because the superclass annotation is {{@After}} on the teardown. With the current FS cache implementation, another test can get the same FS instance that the previous test closed. It can even happen if the tests running are from the *same class*. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643906#comment-16643906 ] Sean Mackrory commented on HADOOP-15819: So I think it was mentioned in the other JIRA that each test should be in their own JVM, but if we're seeing FS's in one test that were already closed, and we get a stack trace that points to another test, then something isn't right. Not closing the fs in tests seems like a possible solution, but (a) we need to rule out the possibility that this is a bug in the actual product, and (b) closing the FS does multiple important things and duplicating that everywhere isn't just a Band-Aid, it's kind of a messy one. The FS cache really feels inherently broken in the parallel tests case, which is why I initially liked the idea of disabling caching for the tests. If you rely on the FileSystem class to either give you an existing instance or create one for you, then we need to rely on it to decide when to close it. Otherwise people either break what other threads are doing or they don't close their FS instances. Both of those are horrible. Some kind of ref-counting, maybe? But modifying the design of the FS cache scares me, because it's been that way for a long time and is kind of important. Which raises the question: if I'm seeing this all the time recently but none of us saw it before a little while ago - what broke it? Because I don't think it was the FS caching. I wonder if git-bisect might be a useful way to get a lead here. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643749#comment-16643749 ] Gabor Bota commented on HADOOP-15819: - Also note: I was getting interesting errors when I removed the line, maybe related that we haven't closed the fs: {noformat} ERROR] Tests run: 9, Failures: 0, Errors: 9, Skipped: 0, Time elapsed: 0.778 s <<< FAILURE! - in org.apache.hadoop .fs.contract.s3a.ITestS3AContractRootDir [ERROR] testRecursiveRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.085 s <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0. 082 s <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.08 s <<< E RROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testListEmptyRootDirectory(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.084 s <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir.testListEmptyRootDirectory(ITestS3AContractRoo tDir.java:63) [ERROR] testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.078 s < << ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testSimpleRootListing(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.079 s << < ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testMkDirDepth1(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.108 s <<< ERRO R! java.lang.IllegalArgumentException: Can not create a Path from an empty string [ERROR] testRmEmptyRootDirRecursive(org.apache.hadoop.fs.contract.s3a.ITestS3AContractRootDir) Time elapsed: 0.083 s <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string (..) {noformat} So there should be another solution to this that won't break the tests but also won't cause the {{FileSystem is closed!}} issue. Could we disable the caching just for these test and have them run on a new FS instance? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR]
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643742#comment-16643742 ] Gabor Bota commented on HADOOP-15819: - If I remove {{IOUtils.closeStream(getFileSystem());}} from org/apache/hadoop/fs/s3a/AbstractS3ATestBase.java:55, it fixes the issue. So the teardown would be {code:java} @Override public void teardown() throws Exception { super.teardown(); } {code} instead of {code:java} @Override public void teardown() throws Exception { super.teardown(); describe("closing file system"); IOUtils.closeStream(getFileSystem()); } {code} so even we could skip the override. We could even do something like {code:java} @Override public void teardown() throws Exception { super.teardown(); if(getConfiguration().getBoolean(FS_S3A_IMPL_DISABLE_CACHE, false)){ describe("closing file system"); IOUtils.closeStream(getFileSystem()); } } {code} but I was still getting some {{FileSystem is closed!}} when I used that method for closing the fs. I wanted to get to the bottom of this, so checked how fs caching works in general. We have a static fs cache in org.apache.hadoop.fs.FileSystem, that we use for cache fs instances. If we don't have an fs instance for the schema provided (e.g s3a://) we create a new instance and store it in the static cache - "If this is the first entry in the map and the JVM is not shutting down this registers a shutdown hook to close filesystems, and adds this FS to the toAutoClose if '{{fs.automatic.close}}' is set in the configuration (default: true).", so I don't see why do we have to close it manually between each an every test. What's your opinion on this [~ste...@apache.org]? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Assignee: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638351#comment-16638351 ] Gabor Bota commented on HADOOP-15819: - So that's {{fs.s3a.impl.disable.cache}} (S3ATestConstants#FS_S3A_IMPL_DISABLE_CACHE) set to false, and that's when we want to close the fs. Otherwise just keep it open? > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638306#comment-16638306 ] Gabor Bota commented on HADOOP-15819: - >From that perspective, every test could use the same instance one after >another - the only way to tell that the fs is not shared is when fs caching is >forced off with {{S3ATestUtils.disableFilesystemCaching(configuration);}}. > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) >
[jira] [Commented] (HADOOP-15819) S3A integration test failures: FileSystem is closed! - without parallel test run
[ https://issues.apache.org/jira/browse/HADOOP-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638286#comment-16638286 ] Steve Loughran commented on HADOOP-15819: - I see. we could find those FS.close() calls and only invoke them if the FS is known to be not-shared? e.g fs.getConf().get("whatever-that-shraed-property-is") to guard the call > S3A integration test failures: FileSystem is closed! - without parallel test > run > > > Key: HADOOP-15819 > URL: https://issues.apache.org/jira/browse/HADOOP-15819 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.1.1 >Reporter: Gabor Bota >Priority: Critical > Attachments: S3ACloseEnforcedFileSystem.java, > closed_fs_closers_example_5klines.log.zip > > > Running the integration tests for hadoop-aws {{mvn -Dscale verify}} against > Amazon AWS S3 (eu-west-1, us-west-1, with no s3guard) we see a lot of these > failures: > {noformat} > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.408 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITDirectoryCommitMRJob) > Time elapsed: 0.027 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 4.345 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob > [ERROR] > testStagingDirectory(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.021 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJob) > Time elapsed: 0.022 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.489 > s <<< FAILURE! - in > org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest > [ERROR] > testMRJob(org.apache.hadoop.fs.s3a.commit.staging.integration.ITStagingCommitMRJobBadDest) > Time elapsed: 0.023 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.695 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob > [ERROR] testMRJob(org.apache.hadoop.fs.s3a.commit.magic.ITMagicCommitMRJob) > Time elapsed: 0.039 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.015 > s <<< FAILURE! - in org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory > [ERROR] > testEverything(org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory) > Time elapsed: 0.014 s <<< ERROR! > java.io.IOException: s3a://cloudera-dev-gabor-ireland: FileSystem is closed! > {noformat} > The big issue is that the tests are running in a serial manner - no test is > running on top of the other - so we should not see that the tests are failing > like this. The issue could be in how we handle > org.apache.hadoop.fs.FileSystem#CACHE - the tests should use the same > S3AFileSystem so if A test uses a FileSystem and closes it in teardown then B > test will get the same FileSystem object from the cache and try to use it, > but it is closed. > We see this a lot in our downstream testing too. It's not possible to tell > that the failed regression test result is an implementation issue in the > runtime code or a test implementation problem. > I've checked when and what closes the S3AFileSystem with a sightly modified > version of S3AFileSystem which logs the closers of the fs if an error should > occur. I'll attach this modified java file for reference. See the next > example of the result when it's running: > {noformat} > 2018-10-04 00:52:25,596 [Thread-4201] ERROR s3a.S3ACloseEnforcedFileSystem > (S3ACloseEnforcedFileSystem.java:checkIfClosed(74)) - Use after close(): > java.lang.RuntimeException: Using closed FS!. > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.checkIfClosed(S3ACloseEnforcedFileSystem.java:73) > at > org.apache.hadoop.fs.s3a.S3ACloseEnforcedFileSystem.mkdirs(S3ACloseEnforcedFileSystem.java:474) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.mkdirs(AbstractFSContractTestBase.java:338) > at > org.apache.hadoop.fs.contract.AbstractFSContractTestBase.setup(AbstractFSContractTestBase.java:193) > at >