[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006865#comment-14006865 ] Hudson commented on MAPREDUCE-5899: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java Support incremental data copy in DistCp --- Key: MAPREDUCE-5899 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.5.0 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch Currently when doing distcp with -update option, for two files with the same file names but with different file length or checksum, we overwrite the whole file. It will be good if we can detect the case where (sourceFile = targetFile + appended_data), and only transfer the appended data segment to the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006880#comment-14006880 ] Hudson commented on MAPREDUCE-5309: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/]) MAPREDUCE-5309. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server. Contributed by Rushabh S Shah (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596295) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_0.23.9-FAILED.jhist * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.0.3-alpha-FAILED.jhist * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.4.0-FAILED.jhist 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Fix For: 3.0.0, 2.5.0 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at
[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006887#comment-14006887 ] Hudson commented on MAPREDUCE-5899: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java Support incremental data copy in DistCp --- Key: MAPREDUCE-5899 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.5.0 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch Currently when doing distcp with -update option, for two files with the same file names but with different file length or checksum, we overwrite the whole file. It will be good if we can detect the case where (sourceFile = targetFile + appended_data), and only transfer the appended data segment to the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006902#comment-14006902 ] Hudson commented on MAPREDUCE-5309: --- FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/]) MAPREDUCE-5309. 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server. Contributed by Rushabh S Shah (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596295) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_0.23.9-FAILED.jhist * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.0.3-alpha-FAILED.jhist * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.4.0-FAILED.jhist 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server - Key: MAPREDUCE-5309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, mrv2 Affects Versions: 2.0.4-alpha Reporter: Vrushali C Assignee: Rushabh S Shah Fix For: 3.0.0, 2.5.0 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist When the 2.0.4 JobHistoryParser tries to parse a job history file generated by hadoop 2.0.3, the jobhistoryparser throws as an error as java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters at org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58) at org.apache.avro.generic.GenericData.setField(GenericData.java:463) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) at org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142) at com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at
[jira] [Assigned] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reassigned MAPREDUCE-5777: Assignee: zhihai xu Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5777: - Attachment: mapreduce-5777.patch Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: mapreduce-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5777: - Status: Patch Available (was: Open) Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0, 0.22.0 Reporter: bc Wong Assignee: zhihai xu Attachments: mapreduce-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006984#comment-14006984 ] zhihai xu commented on MAPREDUCE-5777: -- I just upload a patch for review. This patch included a test case to verify BOM skipped by LineRecordRead. This patch also add a utf-8 text file with BOM which is used by the new test case:testStripBOM. The new test case is passed with this change in LineRecordRead and the new case case will fail with original code. Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: mapreduce-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007184#comment-14007184 ] Wangda Tan commented on MAPREDUCE-5844: --- Hi [~maysamyabandeh], Thanks for your patch, I think currently, headroom needs to be well improved in fair or capacity scheduler. So it's better to make to make your method become a default behavior (change time threshold to 0 and reasonable number in my opinion. A suggestion is, can we simply set a time equals the last mapper container we get, and use this time to check if we run into hard to allocate mapper situation. Which can avoid modify ContainerRequest code. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007441#comment-14007441 ] jay vyas commented on MAPREDUCE-5902: - FYI, a concrete example: These paths, whose job names seem to have been truncated at some point i.e. {{ItemRatingVectorsMappe}} is clearly missing an R .. are not getting picked up by the JobHistoryServer . {noformat} └── tom ├── job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0010_conf.xml ├── job_1400794299637_0010.summary ├── job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0011_conf.xml ├── job_1400794299637_0011.summary ├── job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist ├── job_1400794299637_0012_conf.xml └── job_1400794299637_0012.summary {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5777: - Attachment: (was: mapreduce-5777.patch) Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007604#comment-14007604 ] Hudson commented on MAPREDUCE-5899: --- SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5608/]) MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java * /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java Support incremental data copy in DistCp --- Key: MAPREDUCE-5899 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.5.0 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch Currently when doing distcp with -update option, for two files with the same file names but with different file length or checksum, we overwrite the whole file. It will be good if we can detect the case where (sourceFile = targetFile + appended_data), and only transfer the appended data segment to the target. This will be very useful if we're doing incremental distcp. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-4669) MRAM web UI over HTTPS does not work with Kerberos security enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007668#comment-14007668 ] Larry McCay commented on MAPREDUCE-4669: Hi [~tucu00] - Do you have any idea of the status of this issue? Thanks! MRAM web UI over HTTPS does not work with Kerberos security enabled --- Key: MAPREDUCE-4669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.3-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur With Kerberos enable, the MRAM runs as the user that submitted the job, thus the MRAM process cannot read the cluster keystore files to get the certificates to start its HttpServer using HTTPS. We need to decouple the keystore used by RM/NM/NN/DN (which are cluster provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007680#comment-14007680 ] Jian He commented on MAPREDUCE-5900: Patch looks good overall. I think we need test case to verify the state of the attempt is actually going to killed state. Maybe we can combine the test cases from MAPREDUCE-5848? we can give credit to both. Container preemption interpreted as task failures and eventually job failures -- Key: MAPREDUCE-5900 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch We have Added preemption exit code needs to be incorporated MR needs to recognize the special exit code value of -102 and interpret it as a container being killed instead of a container failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007702#comment-14007702 ] Hadoop QA commented on MAPREDUCE-5777: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646565/MAPREDUCE-5777.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//console This message is automatically generated. Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007707#comment-14007707 ] zhihai xu commented on MAPREDUCE-5777: -- release audit warnings is the new added text file(testBOM.txt) for BOM test do not have an Apache license header. I think it should be OK. The text file(testBOM.txt) is not a source code file. It is just a test resource file which have BOM at the start of the file. Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007731#comment-14007731 ] Hadoop QA commented on MAPREDUCE-5900: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646389/MAPREDUCE-5900-trunk-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4617//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4617//console This message is automatically generated. Container preemption interpreted as task failures and eventually job failures -- Key: MAPREDUCE-5900 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster, mr-am, mrv2 Affects Versions: 2.4.1 Reporter: Mayank Bansal Assignee: Mayank Bansal Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch We have Added preemption exit code needs to be incorporated MR needs to recognize the special exit code value of -102 and interpret it as a container being killed instead of a container failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007814#comment-14007814 ] Hadoop QA commented on MAPREDUCE-5777: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646565/MAPREDUCE-5777.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//console This message is automatically generated. Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007913#comment-14007913 ] Hadoop QA commented on MAPREDUCE-5844: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646359/MAPREDUCE-5844.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4619//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4619//console This message is automatically generated. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: MAPREDUCE-5844.patch We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Summary: JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. (was: JobHistoryServer (HistoryFileManager) needs more debug logs.) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007959#comment-14007959 ] jay vyas commented on MAPREDUCE-5902: - After Further investigation, it appears that files with {{ % escape characters }} in them arent picked up by the JobHistoryServer. I'd like the opinion of one of the JobHistoryServer authors to confirm/deny wether jobnames are indeed allowed to include {{%}} signs in them, i.e. {{name%-myName}}. Has anyone else seen this before? I'd be somewhat surprised if I was the only person who has run into it I can't imagine its a configuration error of any sort? The below files appear to be stuck in mr-history purgatory, neither are they detectable as completed jobs from a REST request {{ curl http://10.1.4.138:19888/ws/v1/history/mapreduce/jobs | python -mjson.tool }} to the JobHistoryServer API, **nor** are they ever moved to {{/mr-history/done/}} {noformat} /mr-history/tmp/tom/job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400794299637_0017-1400814057680-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400814090466-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400873461827_0016-140087454-tom-select+count%28*%29+from+bps_cleaned%28Stage%2D1%29-1400874621636-1-1-SUCCEEDED-default.jhist /mr-history/tmp/tom/job_1400873461827_0023-1400894507822-tom-name%252dname-1400894528285-1-1-SUCCEEDED-default.jhist {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs. Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas updated MAPREDUCE-5902: Description: 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. ** Regarding the actual bug : Skipping completed jhist files ** We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the jobhistoryserver skip recognition of a completed jhist file. ** Regarding logging ** It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} ** Reproducing ** I was able to reproduce this bug by writing a custom mapreduce job with a job name, which contained % characters. I have also seen this with a version of the Mahout ParallelALSFactorizationJob, which includes - characters in its name, which wind up getting replaced by %2D later on at some stage in the job pipeline. was: With the JobHistory Server , it appears that its possible sometimes to skip over certain history files. I havent been able to determine why yet, but I've found that some long named .jhist files aren't getting collected into the done/ directory. After tracing some in the actual source, and turning on DEBUG level logging, it became clear that this snippet is an important workhorse (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately boil down to scanDirectory()). It would be extremely useful , then, to have a couple of gaurded logs at this level of the code, so that we can see, in the log folders, why files are being filtered out , i.e. it is due to filterint or visibility. {noformat} private static ListFileStatus scanDirectory(Path path, FileContext fc, PathFilter pathFilter) throws IOException { path = fc.makeQualified(path); ListFileStatus jhStatusList = new ArrayListFileStatus(); RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path); while (fileStatusIter.hasNext()) { FileStatus fileStatus = fileStatusIter.next(); Path filePath = fileStatus.getPath(); if (fileStatus.isFile() pathFilter.accept(filePath)) { jhStatusList.add(fileStatus); } } return jhStatusList; } {noformat} JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name. - Key: MAPREDUCE-5902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: jay vyas Original Estimate: 1h Remaining Estimate: 1h 1) JobHistoryServer sometimes skips over certain history files, and ignores serving them as completed. 2) In addition to skipping these files, the JobHistoryServer doesnt effectively log which files are being skipped , and why. So In addition to determining why certain types of files are skipped (file name length doesnt appear to be the reason, rather, it appears to be that % characters throw the JobHistoryServer filter off), we should log completed .jhist files which are available in the mr-history/tmp directory, yet they are skipped for some reason. ** Regarding the actual bug : Skipping completed jhist files ** We will need an author of the JobHistoryServer, I think, to chime in on what types of paths for jobs are actually valid. It appears that at least some characters, if in a job name, will make the