[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934667#comment-16934667 ] Siddharth Seth commented on HADOOP-16207: - Attached a simple patch which fixes just the test failures. Doesn't do anything with parallelism, changing dir names to be different across tests etc. Can submit this in a separate jira, if this one is being used for parallelizing the tests. > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-16207.fixtestsonly.txt > > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934024#comment-16934024 ] Siddharth Seth commented on HADOOP-16207: - Also, to run the tests in parallel - the jobs need to start using a different directory name. Currently, all of them use testMRJob (The method name in the common class that all tests inherit from). The issue with the local dir conflict is a MR configuration afaik (Likely the MR tmp dir config property). YARN clusters should already be able to run in parallel (different ports, random dir names, etc) > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933718#comment-16933718 ] Siddharth Seth commented on HADOOP-16207: - Seeing several MR job failures when running tests on HADOOP-16445. {code} [ERROR] ITestMagicCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327 » FileNotFound [ERROR] ITestDirectoryCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327 » FileNotFound [ERROR] ITestPartitionCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327 » FileNotFound [ERROR] ITestStagingCommitMRJob>AbstractITCommitMRJob.testMRJob:146->AbstractFSContractTestBase.assertIsDirectory:327 » FileNotFound {code} always fail when run with -Ds3guard -Ddynamo -Dauth (These fail when starting with a clean DDB table as well) The test setup seems broken to me. * Cluster set up happens with createCluster(new JobConf()) * After this, AbstractITCommitMRJob creates the MRJob with Job.getInstance(getClusterBinding().getConf() ... -> This will end up using the previously created JobConf * JobConf will only read core-site.xml ... so the command line parameters -Ds3guard, -Ddynamo -Dauth don't make a difference. Adding fs.s3a.metadatastore.authoritative=true, fs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore in auth-keys.xml or core-site.xml fixed all the test failures for me. (With the additions, the JobConf used by the cluster has these configs, and the tests do what they're supposed to). That isn't the correct fix though. Making sure the test configuration is used to create the JobConf for the cluster and jobs would allow the test properties to work. That said, I did see 3 empty (and marked as deleted) files - part_, part_0001, _SUCCESS in the s3guard table. I suspect this is a result of the committer trying to access a file on the client, getting a cached FileSystem instance (same UGI), and the getFileStatus (maybe) creates these S3Guard DDB entries? > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887945#comment-16887945 ] Steve Loughran commented on HADOOP-16207: - Staging problem is fixed by MAPREDUCE-6521, and as it only seems to surface when the cluster FS is local (unconfirmed) then its not likely to be the cause of the previous failures (when HDFS was used as the cluster FS) And, given it seems to be a race condition, doesn't explain why we'd see failures during sequential test runs. > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887471#comment-16887471 ] Steve Loughran commented on HADOOP-16207: - Working on this. Finally got a log. And (currently) /tmp/hadoop-yarn/staging/ doesn't exist. Assumption: all the miniYarnClusters are sharing the same /tmp staging dir, so that when one is shutdown while another is running, the second one fails as all its staging files go away -in which case yes, it is a race condition. At least this time. {code} (TaskAttemptListenerImpl.java:fatalError(288)) - Task: attempt_1563401248365_0003_m_00_0 - exited : java.io.FileNotFoundException: File file:/tmp/hadoop-yarn/staging/stevel/.staging/job_1563401248365_0003/job.split does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:153) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:354) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:917) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:362) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172) {code} > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841394#comment-16841394 ] Steve Loughran commented on HADOOP-16207: - suspecting a race condition in >1 test. If we isolate paths this should go away > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16207) Fix ITestDirectoryCommitMRJob.testMRJob
[ https://issues.apache.org/jira/browse/HADOOP-16207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825115#comment-16825115 ] Steve Loughran commented on HADOOP-16207: - FIx HADOOP-16184 and provided this is a non-auth test run, this will act as regression test to make sure the fix works in real situations > Fix ITestDirectoryCommitMRJob.testMRJob > --- > > Key: HADOOP-16207 > URL: https://issues.apache.org/jira/browse/HADOOP-16207 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > Reported failure of {{ITestDirectoryCommitMRJob}} in validation runs of > HADOOP-16186; assertIsDirectory with s3guard enabled and a parallel test run: > Path "is recorded as deleted by S3Guard" > {code} > waitForConsistency(); > assertIsDirectory(outputPath) /* here */ > {code} > The file is there but there's a tombstone. Possibilities > * some race condition with another test > * tombstones aren't timing out > * committers aren't creating that base dir in a way which cleans up S3Guard's > tombstones. > Remember: we do have to delete that dest dir before the committer runs unless > overwrite==true, so at the start of the run there will be a tombstone. It > should be overwritten by a success. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org