[jira] [Updated] (MAPREDUCE-6785) ContainerLauncherImpl support for reusing the containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-6785: - Assignee: Naganarasimha G R (was: Devaraj K) Status: Open (was: Patch Available) Thanks [~Naganarasimha] for the patch. Patch looks fine to me except the checkstyle warnings. I think we can handle these, Can you update the patch with checkstyle fixes which are showing in the report? > ContainerLauncherImpl support for reusing the containers > > > Key: MAPREDUCE-6785 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6785 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 >Reporter: Devaraj K >Assignee: Naganarasimha G R > Attachments: MAPREDUCE-6785-MR-6749.003.patch, > MAPREDUCE-6785-MR-6749.004.patch, MAPREDUCE-6785-MR-6749.005.patch, > MAPREDUCE-6785-MR-6749.006.patch, MAPREDUCE-6785-v0.patch, > MAPREDUCE-6785-v1.patch, MAPREDUCE-6785-v2.patch > > > Add support to Container Launcher for reuse of the containers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6829) Add peak memory usage counter for each task
[ https://issues.apache.org/jira/browse/MAPREDUCE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954433#comment-15954433 ] Miklos Szegedi commented on MAPREDUCE-6829: --- Thank you for the comment, [~mingma]. This jira primarily targeted branch-2. I agree that ATS should be the primary source of this information in versions 3 and above. Just as a side note, it was important to separate map and reduce numbers. Is this possible with YARN-3045? > Add peak memory usage counter for each task > --- > > Key: MAPREDUCE-6829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Yufei Gu >Assignee: Miklos Szegedi > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6829.000.patch, MAPREDUCE-6829.001.patch, > MAPREDUCE-6829.002.patch, MAPREDUCE-6829.003.patch, MAPREDUCE-6829.004.patch, > MAPREDUCE-6829.005.patch > > > Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which > are snapshots of memory usage of that task. They are not sufficient for users > to understand peak memory usage by that task, e.g. in order to diagnose task > failures, tune job parameters or change application design. This new feature > will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and > VIRTUAL_MEMORY_BYTES_MAX. > This JIRA has the same feature from MAPREDUCE-4710. I file this new YARN > JIRA since MAPREDUCE-4710 is pretty old one from MR 1.x era, it more or less > assumes a branch-1 architecture, should be close at this point. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954409#comment-15954409 ] Sangjin Lee commented on MAPREDUCE-6846: The latest patch LGTM. [~dan...@cloudera.com] what do you think? > Fragments specified for libjar paths are not handled correctly > -- > > Key: MAPREDUCE-6846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: MAPREDUCE-6846-trunk.001.patch, > MAPREDUCE-6846-trunk.002.patch, MAPREDUCE-6846-trunk.003.patch, > MAPREDUCE-6846-trunk.004.patch, MAPREDUCE-6846-trunk.005.patch, > MAPREDUCE-6846-trunk.006.patch > > > If a user specifies a fragment for a libjars path via generic options parser, > the client crashes with a FileNotFoundException: > {noformat} > java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314) > at > org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > {noformat} > This is actually inconsistent with the behavior for files and archives. Here > is a table showing the current behavior for each type of path and resource: > | || Qualified path (i.e. file://home/mapred/test.txt#frag.txt) || Absolute > path (i.e. /home/mapred/test.txt#frag.txt) || Relative path (i.e. > test.txt#frag.txt) || > || -libjars | FileNotFound | FileNotFound|FileNotFound| > || -files | (/) | (/) | (/) | > || -archives | (/) | (/) | (/) | -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6829) Add peak memory usage counter for each task
[ https://issues.apache.org/jira/browse/MAPREDUCE-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954401#comment-15954401 ] Ming Ma commented on MAPREDUCE-6829: With YARN-3045, is it still necessary? Container level metrics like this seems to be quite useful for other frameworks other than MR and it is something YARN can provide if it hasn't been done. > Add peak memory usage counter for each task > --- > > Key: MAPREDUCE-6829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Yufei Gu >Assignee: Miklos Szegedi > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6829.000.patch, MAPREDUCE-6829.001.patch, > MAPREDUCE-6829.002.patch, MAPREDUCE-6829.003.patch, MAPREDUCE-6829.004.patch, > MAPREDUCE-6829.005.patch > > > Each task has counters PHYSICAL_MEMORY_BYTES and VIRTUAL_MEMORY_BYTES, which > are snapshots of memory usage of that task. They are not sufficient for users > to understand peak memory usage by that task, e.g. in order to diagnose task > failures, tune job parameters or change application design. This new feature > will add two more counters for each task: PHYSICAL_MEMORY_BYTES_MAX and > VIRTUAL_MEMORY_BYTES_MAX. > This JIRA has the same feature from MAPREDUCE-4710. I file this new YARN > JIRA since MAPREDUCE-4710 is pretty old one from MR 1.x era, it more or less > assumes a branch-1 architecture, should be close at this point. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954025#comment-15954025 ] Chris Trezzo commented on MAPREDUCE-6846: - Any additional comments [~templedf]? Thanks! > Fragments specified for libjar paths are not handled correctly > -- > > Key: MAPREDUCE-6846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: MAPREDUCE-6846-trunk.001.patch, > MAPREDUCE-6846-trunk.002.patch, MAPREDUCE-6846-trunk.003.patch, > MAPREDUCE-6846-trunk.004.patch, MAPREDUCE-6846-trunk.005.patch, > MAPREDUCE-6846-trunk.006.patch > > > If a user specifies a fragment for a libjars path via generic options parser, > the client crashes with a FileNotFoundException: > {noformat} > java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314) > at > org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > {noformat} > This is actually inconsistent with the behavior for files and archives. Here > is a table showing the current behavior for each type of path and resource: > | || Qualified path (i.e. file://home/mapred/test.txt#frag.txt) || Absolute > path (i.e. /home/mapred/test.txt#frag.txt) || Relative path (i.e. > test.txt#frag.txt) || > || -libjars | FileNotFound | FileNotFound|FileNotFound| > || -files | (/) | (/) | (/) | > || -archives | (/) | (/) | (/) | -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6824) TaskAttemptImpl#createCommonContainerLaunchContext is longer than 150 lines
[ https://issues.apache.org/jira/browse/MAPREDUCE-6824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953886#comment-15953886 ] Chris Trezzo commented on MAPREDUCE-6824: - Thanks [~ajisakaa] and [~haibochen]! > TaskAttemptImpl#createCommonContainerLaunchContext is longer than 150 lines > --- > > Key: MAPREDUCE-6824 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6824 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Trivial > Labels: newbie > Fix For: 3.0.0-alpha3 > > Attachments: MAPREDUCE-6824-trunk.001.patch, > MAPREDUCE-6824-trunk.002.patch, MAPREDUCE-6824-trunk.003.patch > > > bq. > ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java:752: > private static ContainerLaunchContext createCommonContainerLaunchContext(:3: > Method length is 172 lines (max allowed is 150). > {{TaskAttemptImpl#createCommonContainerLaunchContext}} is longer than 150 > lines and needs to be refactored. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6874) Make DistributedCache check if the content of a directory has changed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953500#comment-15953500 ] Attila Sasvari commented on MAPREDUCE-6874: --- [~jlowe] Thanks for the detailed explanation. I was wondering why distributed cache accepted directory as input, but now I understand this is because of legacy reasons. Negative impact on performance is also clear. Closing this with won't fix. > Make DistributedCache check if the content of a directory has changed > - > > Key: MAPREDUCE-6874 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6874 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Attila Sasvari > > DistributedCache does not check recursively if the content a directory has > changed when adding files to it with {{DistributedCache.addCacheFile()}}. > h5. Background > I have an Oozie workflow on HDFS: > {code} > example_workflow > ├── job.properties > ├── lib > │ ├── components > │ │ ├── sub-component.sh > │ │ └── subsub > │ │ └── subsub.sh > │ ├── main.sh > │ └── sub.sh > └── workflow.xml > {code} > Executed the workflow; then made some changes in {{subsub.sh}}. Replaced the > file on HDFS. When I re-ran the workflow, DistributedCache did not notice the > changes as the timestamp on the {{components}} directory did not change. As a > result, the old script was materialized. > This behaviour might be related to [determineTimestamps() > |https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/filecache/ClientDistributedCacheManager.java#L84]. > In order to use the new script during workflow execution, I had to update the > whole {{components}} directory. > h6. Some more info: > In Oozie, [DistributedCache.addCacheFile() > |https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L625] > is used to add files to the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6874) Make DistributedCache check if the content of a directory has changed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved MAPREDUCE-6874. --- Resolution: Won't Fix > Make DistributedCache check if the content of a directory has changed > - > > Key: MAPREDUCE-6874 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6874 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Attila Sasvari > > DistributedCache does not check recursively if the content a directory has > changed when adding files to it with {{DistributedCache.addCacheFile()}}. > h5. Background > I have an Oozie workflow on HDFS: > {code} > example_workflow > ├── job.properties > ├── lib > │ ├── components > │ │ ├── sub-component.sh > │ │ └── subsub > │ │ └── subsub.sh > │ ├── main.sh > │ └── sub.sh > └── workflow.xml > {code} > Executed the workflow; then made some changes in {{subsub.sh}}. Replaced the > file on HDFS. When I re-ran the workflow, DistributedCache did not notice the > changes as the timestamp on the {{components}} directory did not change. As a > result, the old script was materialized. > This behaviour might be related to [determineTimestamps() > |https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/filecache/ClientDistributedCacheManager.java#L84]. > In order to use the new script during workflow execution, I had to update the > whole {{components}} directory. > h6. Some more info: > In Oozie, [DistributedCache.addCacheFile() > |https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L625] > is used to add files to the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6874) Make DistributedCache check if the content of a directory has changed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953441#comment-15953441 ] Jason Lowe commented on MAPREDUCE-6874: --- This is a limitation with distributed cache. It can be very expensive to do a full-depth traversal of a directory tree, and the API only supports one timestamp for a distributed cache entry. Not only is it expensive to perform the stats of the tree in order to see if it is changed, it's also expensive to localize the files. There's RPC overhead for each file in the tree. It is much more efficient, and safer, for an archive (e.g.: .tar.gz, .zip, etc.) to be used instead of a directory. Then there's only one timestamp we need to check to know if anything in the "tree" has changed. Arguably directory trees shouldn't be supported in the distributed cache at all, but I believe they were added way back when to support use cases where a chain of MapReduce jobs needed the output of a previous job (i.e.: a directory) to be used as a cache file for the next job (e.g.: a map-side join). > Make DistributedCache check if the content of a directory has changed > - > > Key: MAPREDUCE-6874 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6874 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Attila Sasvari > > DistributedCache does not check recursively if the content a directory has > changed when adding files to it with {{DistributedCache.addCacheFile()}}. > h5. Background > I have an Oozie workflow on HDFS: > {code} > example_workflow > ├── job.properties > ├── lib > │ ├── components > │ │ ├── sub-component.sh > │ │ └── subsub > │ │ └── subsub.sh > │ ├── main.sh > │ └── sub.sh > └── workflow.xml > {code} > Executed the workflow; then made some changes in {{subsub.sh}}. Replaced the > file on HDFS. When I re-ran the workflow, DistributedCache did not notice the > changes as the timestamp on the {{components}} directory did not change. As a > result, the old script was materialized. > This behaviour might be related to [determineTimestamps() > |https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/filecache/ClientDistributedCacheManager.java#L84]. > In order to use the new script during workflow execution, I had to update the > whole {{components}} directory. > h6. Some more info: > In Oozie, [DistributedCache.addCacheFile() > |https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L625] > is used to add files to the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6874) Make DistributedCache check if the content of a directory has changed
Attila Sasvari created MAPREDUCE-6874: - Summary: Make DistributedCache check if the content of a directory has changed Key: MAPREDUCE-6874 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6874 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Attila Sasvari DistributedCache does not check recursively if the content a directory has changed when adding files to it with {{DistributedCache.addCacheFile()}}. h5. Background I have an Oozie workflow on HDFS: {code} example_workflow ├── job.properties ├── lib │ ├── components │ │ ├── sub-component.sh │ │ └── subsub │ │ └── subsub.sh │ ├── main.sh │ └── sub.sh └── workflow.xml {code} Executed the workflow; then made some changes in {{subsub.sh}}. Replaced the file on HDFS. When I re-ran the workflow, DistributedCache did not notice the changes as the timestamp on the {{components}} directory did not change. As a result, the old script was materialized. This behaviour might be related to [determineTimestamps() |https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/filecache/ClientDistributedCacheManager.java#L84]. In order to use the new script during workflow execution, I had to update the whole {{components}} directory. h6. Some more info: In Oozie, [DistributedCache.addCacheFile() |https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L625] is used to add files to the distributed cache. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org