[jira] [Assigned] (BEAM-3272) ParDoTranslatorTest: Error creating local cluster while creating checkpoint file
[ https://issues.apache.org/jira/browse/BEAM-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3272: - Assignee: (was: Kenneth Knowles) > ParDoTranslatorTest: Error creating local cluster while creating checkpoint > file > > > Key: BEAM-3272 > URL: https://issues.apache.org/jira/browse/BEAM-3272 > Project: Beam > Issue Type: Bug > Components: runner-apex >Reporter: Eugene Kirpichov >Priority: Critical > Labels: flake, sickbay > Time Spent: 0.5h > Remaining Estimate: 0h > > Failed build: > https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console > Key output: > {code} > 2017-11-29T01:21:26.956 [ERROR] > testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest) > Time elapsed: 2.007 s <<< ERROR! > java.lang.RuntimeException: Error creating local cluster > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46) > at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290) > at > org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156) > {code} > ... > {code} > Caused by: ExitCodeException exitCode=1: chmod: cannot access > ‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’: > No such file or directory > at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:866) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:849) > at > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:225) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328) > at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:352) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:688) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119) > ... 50 more > {code} > By inspecting code at the stack frames, seems it's trying to copy an > operator's checkpoint "to HDFS" (which in this case is the local disk), but > fails while creating the target file of the copy - creation creates the file > (successfully) and chmods it writable (unsuccessfully). Barring something > subtle (e.g. chmod being not allowed to call immediately after creating a > FileOutputStream), this looks like the whole directory was possibly deleted > from under the process. I don't know why this would be the case though, or > how to debug it. > Either way, the path being accessed is funky: > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/... > - I think it'd be better if this test used a "@Rule TemporaryFolder" to > store Apex checkpoints. I don't know whether the Apex runner allows that, but > I can see how it could help reduce interference betw
[jira] [Assigned] (BEAM-3272) ParDoTranslatorTest: Error creating local cluster while creating checkpoint file
[ https://issues.apache.org/jira/browse/BEAM-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-3272: - Assignee: Kenneth Knowles (was: Thomas Weise) > ParDoTranslatorTest: Error creating local cluster while creating checkpoint > file > > > Key: BEAM-3272 > URL: https://issues.apache.org/jira/browse/BEAM-3272 > Project: Beam > Issue Type: Bug > Components: runner-apex >Reporter: Eugene Kirpichov >Assignee: Kenneth Knowles >Priority: Critical > Labels: flake > > Failed build: > https://builds.apache.org/job/beam_PostCommit_Java_MavenInstall/org.apache.beam$beam-runners-apex/5330/console > Key output: > {code} > 2017-11-29T01:21:26.956 [ERROR] > testAssertionFailure(org.apache.beam.runners.apex.translation.ParDoTranslatorTest) > Time elapsed: 2.007 s <<< ERROR! > java.lang.RuntimeException: Error creating local cluster > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.getController(EmbeddedAppLauncherImpl.java:122) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:71) > at > org.apache.apex.engine.EmbeddedAppLauncherImpl.launchApp(EmbeddedAppLauncherImpl.java:46) > at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:197) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:57) > at > org.apache.beam.runners.apex.TestApexRunner.run(TestApexRunner.java:31) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:304) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:290) > at > org.apache.beam.runners.apex.translation.ParDoTranslatorTest.runExpectingAssertionFailure(ParDoTranslatorTest.java:156) > {code} > ... > {code} > Caused by: ExitCodeException exitCode=1: chmod: cannot access > ‘/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/com.datatorrent.stram.StramLocalCluster/checkpoints/2/_tmp’: > No such file or directory > at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) > at org.apache.hadoop.util.Shell.run(Shell.java:479) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:866) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:849) > at > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:225) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:209) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328) > at org.apache.hadoop.fs.FileSystem.primitiveCreate(FileSystem.java:1017) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:99) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:352) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:399) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:584) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:686) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:682) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:688) > at > com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:119) > ... 50 more > {code} > By inspecting code at the stack frames, seems it's trying to copy an > operator's checkpoint "to HDFS" (which in this case is the local disk), but > fails while creating the target file of the copy - creation creates the file > (successfully) and chmods it writable (unsuccessfully). Barring something > subtle (e.g. chmod being not allowed to call immediately after creating a > FileOutputStream), this looks like the whole directory was possibly deleted > from under the process. I don't know why this would be the case though, or > how to debug it. > Either way, the path being accessed is funky: > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Java_MavenInstall/src/runners/apex/target/... > - I think it'd be better if this test used a "@Rule TemporaryFolder" to > store Apex checkpoints. I don't know whether the Apex runner allows that, but > I can see how it could help reduce interference between tests and