[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006865#comment-14006865
 ] 

Hudson commented on MAPREDUCE-5899:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing 
Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java


 Support incremental data copy in DistCp
 ---

 Key: MAPREDUCE-5899
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, 
 MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch


 Currently when doing distcp with -update option, for two files with the same 
 file names but with different file length or checksum, we overwrite the whole 
 file. It will be good if we can detect the case where (sourceFile = 
 targetFile + appended_data), and only transfer the appended data segment to 
 the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006880#comment-14006880
 ] 

Hudson commented on MAPREDUCE-5309:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1781 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1781/])
MAPREDUCE-5309. 2.0.4 JobHistoryParser can't parse certain failed job history 
files generated by 2.0.3 history server. Contributed by Rushabh S Shah (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596295)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_0.23.9-FAILED.jhist
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.0.3-alpha-FAILED.jhist
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.4.0-FAILED.jhist


 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Fix For: 3.0.0, 2.5.0

 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, 
 Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at 

[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006887#comment-14006887
 ] 

Hudson commented on MAPREDUCE-5899:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing 
Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java


 Support incremental data copy in DistCp
 ---

 Key: MAPREDUCE-5899
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, 
 MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch


 Currently when doing distcp with -update option, for two files with the same 
 file names but with different file length or checksum, we overwrite the whole 
 file. It will be good if we can detect the case where (sourceFile = 
 targetFile + appended_data), and only transfer the appended data segment to 
 the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5309) 2.0.4 JobHistoryParser can't parse certain failed job history files generated by 2.0.3 history server

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006902#comment-14006902
 ] 

Hudson commented on MAPREDUCE-5309:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1755 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1755/])
MAPREDUCE-5309. 2.0.4 JobHistoryParser can't parse certain failed job history 
files generated by 2.0.3 history server. Contributed by Rushabh S Shah (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596295)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/EventReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_0.23.9-FAILED.jhist
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.0.3-alpha-FAILED.jhist
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_2.4.0-FAILED.jhist


 2.0.4 JobHistoryParser can't parse certain failed job history files generated 
 by 2.0.3 history server
 -

 Key: MAPREDUCE-5309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5309
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 2.0.4-alpha
Reporter: Vrushali C
Assignee: Rushabh S Shah
 Fix For: 3.0.0, 2.5.0

 Attachments: MAPREDUCE-5309-v2.patch, MAPREDUCE-5309-v3.patch, 
 MAPREDUCE-5309-v4.patch, MAPREDUCE-5309-v5.patch, MAPREDUCE-5309.patch, 
 Test20JobHistoryParsing.java, job_2_0_3-KILLED.jhist


 When the 2.0.4 JobHistoryParser tries to parse a job history file generated 
 by hadoop 2.0.3, the jobhistoryparser throws as an error as
 java.lang.ClassCastException: org.apache.avro.generic.GenericData$Array 
 cannot be cast to org.apache.hadoop.mapreduce.jobhistory.JhCounters
 at 
 org.apache.hadoop.mapreduce.jobhistory.TaskAttemptUnsuccessfulCompletion.put(TaskAttemptUnsuccessfulCompletion.java:58)
 at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
 at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
 at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
 at 
 org.apache.hadoop.mapreduce.jobhistory.EventReader.getNextEvent(EventReader.java:93)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:111)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:156)
 at 
 org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:142)
 at 
 com.twitter.somepackage.Test20JobHistoryParsing.testFileAvro(Test20JobHistoryParsing.java:23)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at 

[jira] [Assigned] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned MAPREDUCE-5777:


Assignee: zhihai xu

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu

 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5777:
-

Attachment: mapreduce-5777.patch

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: mapreduce-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5777:
-

Status: Patch Available  (was: Open)

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0, 0.22.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: mapreduce-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006984#comment-14006984
 ] 

zhihai xu commented on MAPREDUCE-5777:
--

I just upload a patch for review. This patch included a test case to verify BOM 
skipped by LineRecordRead.
This patch also add a utf-8 text file with BOM which is used by the new test 
case:testStripBOM.
The new test case is passed with this change in LineRecordRead and the new case 
case will fail with original code.

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: mapreduce-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-05-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007184#comment-14007184
 ] 

Wangda Tan commented on MAPREDUCE-5844:
---

Hi [~maysamyabandeh], 
Thanks for your patch, I think currently, headroom needs to be well improved in 
fair or capacity scheduler. So it's better to make to make your method become a 
default behavior (change time threshold to 0 and reasonable number in my 
opinion. 
A suggestion is, can we simply set a time equals the last mapper container we 
get, and use this time to check if we run into hard to allocate mapper 
situation. Which can avoid modify ContainerRequest code.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
 Attachments: MAPREDUCE-5844.patch


 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.

2014-05-23 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007441#comment-14007441
 ] 

jay vyas commented on MAPREDUCE-5902:
-

FYI, a concrete example:  These paths, whose job names seem to have been 
truncated at some point i.e. {{ItemRatingVectorsMappe}} is clearly missing an 
R .. are not getting picked up by the JobHistoryServer .  

{noformat}
└── tom
├── 
job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0010_conf.xml
├── job_1400794299637_0010.summary
├── 
job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0011_conf.xml
├── job_1400794299637_0011.summary
├── 
job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0012_conf.xml
└── job_1400794299637_0012.summary
{noformat}

 JobHistoryServer (HistoryFileManager) needs more debug logs.
 

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 With the JobHistory Server , it appears that its possible sometimes to skip 
 over certain history files.  I havent been able to determine why yet, but 
 I've found that some long named .jhist files aren't getting collected into 
 the done/ directory.
 After tracing some in the actual source, and turning on DEBUG level logging, 
 it became clear that this snippet is an important workhorse 
 (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles 
 ultimately boil down to scanDirectory()).  
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-5777:
-

Attachment: (was: mapreduce-5777.patch)

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5899) Support incremental data copy in DistCp

2014-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007604#comment-14007604
 ] 

Hudson commented on MAPREDUCE-5899:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #5608 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5608/])
MAPREDUCE-5899. Support incremental data copy in DistCp. Contributed by Jing 
Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1596931)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FilterFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/HarFileSystem.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestHarFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/resources/DatanodeWebHdfsMethods.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestGetFileChecksum.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestCopyMapper.java
* 
/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/mapred/TestRetriableFileCopyCommand.java


 Support incremental data copy in DistCp
 ---

 Key: MAPREDUCE-5899
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5899
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 2.5.0

 Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch, 
 MAPREDUCE-5899.002.patch, MAPREDUCE-5899.002.patch


 Currently when doing distcp with -update option, for two files with the same 
 file names but with different file length or checksum, we overwrite the whole 
 file. It will be good if we can detect the case where (sourceFile = 
 targetFile + appended_data), and only transfer the appended data segment to 
 the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4669) MRAM web UI over HTTPS does not work with Kerberos security enabled

2014-05-23 Thread Larry McCay (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007668#comment-14007668
 ] 

Larry McCay commented on MAPREDUCE-4669:


Hi [~tucu00] - Do you have any idea of the status of this issue? Thanks!

 MRAM web UI over HTTPS does not work with Kerberos security enabled
 ---

 Key: MAPREDUCE-4669
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
 the MRAM process cannot read the cluster keystore files to get the 
 certificates to start its HttpServer using HTTPS.
 We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
 provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures

2014-05-23 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007680#comment-14007680
 ] 

Jian He commented on MAPREDUCE-5900:


Patch looks good overall.
I think we need test case to verify the state of the attempt is actually going 
to killed state. Maybe we can combine the test cases from MAPREDUCE-5848? we 
can give credit to both.

 Container preemption interpreted as task failures and eventually job failures 
 --

 Key: MAPREDUCE-5900
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch


 We have Added preemption exit code needs to be incorporated
 MR needs to recognize the special exit code value of -102 and interpret it as 
 a container being killed instead of a container failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007702#comment-14007702
 ] 

Hadoop QA commented on MAPREDUCE-5777:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646565/MAPREDUCE-5777.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4616//console

This message is automatically generated.

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007707#comment-14007707
 ] 

zhihai xu commented on MAPREDUCE-5777:
--

release audit warnings is
the new added text file(testBOM.txt) for BOM test do not have an Apache license 
header.
I think it should be OK.
The text file(testBOM.txt) is not a source code file. It is just a test 
resource file which have BOM at the start of the file.


 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5900) Container preemption interpreted as task failures and eventually job failures

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007731#comment-14007731
 ] 

Hadoop QA commented on MAPREDUCE-5900:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12646389/MAPREDUCE-5900-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4617//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4617//console

This message is automatically generated.

 Container preemption interpreted as task failures and eventually job failures 
 --

 Key: MAPREDUCE-5900
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5900
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster, mr-am, mrv2
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: MAPREDUCE-5900-1.patch, MAPREDUCE-5900-trunk-1.patch


 We have Added preemption exit code needs to be incorporated
 MR needs to recognize the special exit code value of -102 and interpret it as 
 a container being killed instead of a container failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007814#comment-14007814
 ] 

Hadoop QA commented on MAPREDUCE-5777:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646565/MAPREDUCE-5777.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4618//console

This message is automatically generated.

 Support utf-8 text with BOM (byte order marker)
 ---

 Key: MAPREDUCE-5777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0, 2.2.0
Reporter: bc Wong
Assignee: zhihai xu
 Attachments: MAPREDUCE-5777.patch


 UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and 
 friends should recognize the BOM and not treat it as actual data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007913#comment-14007913
 ] 

Hadoop QA commented on MAPREDUCE-5844:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646359/MAPREDUCE-5844.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4619//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4619//console

This message is automatically generated.

 Reducer Preemption is too aggressive
 

 Key: MAPREDUCE-5844
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Maysam Yabandeh
Assignee: Maysam Yabandeh
 Attachments: MAPREDUCE-5844.patch


 We observed cases where the reducer preemption makes the job finish much 
 later, and the preemption does not seem to be necessary since after 
 preemption both the preempted reducer and the mapper are assigned 
 immediately--meaning that there was already enough space for the mapper.
 The logic for triggering preemption is at 
 RMContainerAllocator::preemptReducesIfNeeded
 The preemption is triggered if the following is true:
 {code}
 headroom +  am * |m| + pr * |r|  mapResourceRequest
 {code} 
 where am: number of assigned mappers, |m| is mapper size, pr is number of 
 reducers being preempted, and |r| is the reducer size.
 The original idea apparently was that if headroom is not big enough for the 
 new mapper requests, reducers should be preempted. This would work if the job 
 is alone in the cluster. Once we have queues, the headroom calculation 
 becomes more complicated and it would require a separate headroom calculation 
 per queue/job.
 So, as a result headroom variable is kind of given up currently: *headroom is 
 always set to 0* What this implies to the speculation is that speculation 
 becomes very aggressive, not considering whether there is enough space for 
 the mappers or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-23 Thread jay vyas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas updated MAPREDUCE-5902:


Summary: JobHistoryServer (HistoryFileManager) needs more debug logs, fails 
to pick up jobs with % characters in the name.  (was: JobHistoryServer 
(HistoryFileManager) needs more debug logs.)

 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 With the JobHistory Server , it appears that its possible sometimes to skip 
 over certain history files.  I havent been able to determine why yet, but 
 I've found that some long named .jhist files aren't getting collected into 
 the done/ directory.
 After tracing some in the actual source, and turning on DEBUG level logging, 
 it became clear that this snippet is an important workhorse 
 (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles 
 ultimately boil down to scanDirectory()).  
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.

2014-05-23 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007959#comment-14007959
 ] 

jay vyas commented on MAPREDUCE-5902:
-

After Further investigation, it appears that files with {{ % escape characters 
}} in them arent picked up by the JobHistoryServer.  I'd like the opinion of 
one of the JobHistoryServer authors to confirm/deny wether jobnames are indeed 
allowed to include {{%}} signs in them, i.e. {{name%-myName}}.  

Has anyone else seen this before?  I'd be somewhat surprised if I was the only 
person who has run into it  I can't imagine its a configuration error of 
any sort?

The below files appear to be stuck in mr-history purgatory, neither are 
they detectable as completed jobs from a REST request {{ curl 
http://10.1.4.138:19888/ws/v1/history/mapreduce/jobs | python -mjson.tool }} to 
the JobHistoryServer API, **nor** are they ever moved to {{/mr-history/done/}}

{noformat}
/mr-history/tmp/tom/job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0017-1400814057680-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400814090466-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0016-140087454-tom-select+count%28*%29+from+bps_cleaned%28Stage%2D1%29-1400874621636-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0023-1400894507822-tom-name%252dname-1400894528285-1-1-SUCCEEDED-default.jhist
{noformat}

 JobHistoryServer (HistoryFileManager) needs more debug logs.
 

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 With the JobHistory Server , it appears that its possible sometimes to skip 
 over certain history files.  I havent been able to determine why yet, but 
 I've found that some long named .jhist files aren't getting collected into 
 the done/ directory.
 After tracing some in the actual source, and turning on DEBUG level logging, 
 it became clear that this snippet is an important workhorse 
 (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles 
 ultimately boil down to scanDirectory()).  
 It would be extremely useful , then, to have a couple of gaurded logs at this 
 level of the code, so that we can see, in the log folders, why files are 
 being filtered out  , i.e. it is due to filterint or visibility.
 {noformat}
   private static ListFileStatus scanDirectory(Path path, FileContext fc,
   PathFilter pathFilter) throws IOException {
 path = fc.makeQualified(path);
 ListFileStatus jhStatusList = new ArrayListFileStatus();
 RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
 while (fileStatusIter.hasNext()) {
   FileStatus fileStatus = fileStatusIter.next();
   Path filePath = fileStatus.getPath();
   if (fileStatus.isFile()  pathFilter.accept(filePath)) {
 jhStatusList.add(fileStatus);
   }
 }
 return jhStatusList;
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up jobs with % characters in the name.

2014-05-23 Thread jay vyas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas updated MAPREDUCE-5902:


Description: 
1) JobHistoryServer sometimes skips over certain history files, and ignores 
serving them as completed.

2) In addition to skipping these files, the JobHistoryServer doesnt effectively 
log which files are being skipped , and why.  

So In addition to determining why certain types of files are skipped (file name 
length doesnt appear to be the reason, rather, it appears to be that % 
characters throw the JobHistoryServer filter off), we should log completed 
.jhist  files which  are available in the mr-history/tmp directory, yet they 
are skipped for some reason. 

** Regarding the actual bug : Skipping completed jhist files ** 

We will need an author of the JobHistoryServer, I think, to chime in on what 
types of paths for jobs are actually valid.  It appears that at least some 
characters, if in a job name, will make the jobhistoryserver skip recognition 
of a completed jhist file.

** Regarding logging **
It would be extremely useful , then, to have a couple of gaurded logs at this 
level of the code, so that we can see, in the log folders, why files are being 
filtered out  , i.e. it is due to filterint or visibility.

{noformat}

  private static ListFileStatus scanDirectory(Path path, FileContext fc,
  PathFilter pathFilter) throws IOException {
path = fc.makeQualified(path);
ListFileStatus jhStatusList = new ArrayListFileStatus();
RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
while (fileStatusIter.hasNext()) {
  FileStatus fileStatus = fileStatusIter.next();
  Path filePath = fileStatus.getPath();
  if (fileStatus.isFile()  pathFilter.accept(filePath)) {
jhStatusList.add(fileStatus);
  }
}
return jhStatusList;
  }

{noformat}

** Reproducing ** 

I was able to reproduce this bug by writing a custom mapreduce job with a job 
name, which contained % characters.  I have also seen this with a version of 
the Mahout ParallelALSFactorizationJob, which includes - characters in its 
name, which wind up getting replaced by %2D later on at some stage in the job 
pipeline.


  was:
With the JobHistory Server , it appears that its possible sometimes to skip 
over certain history files.  I havent been able to determine why yet, but I've 
found that some long named .jhist files aren't getting collected into the done/ 
directory.

After tracing some in the actual source, and turning on DEBUG level logging, it 
became clear that this snippet is an important workhorse 
(scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles ultimately 
boil down to scanDirectory()).  

It would be extremely useful , then, to have a couple of gaurded logs at this 
level of the code, so that we can see, in the log folders, why files are being 
filtered out  , i.e. it is due to filterint or visibility.

{noformat}

  private static ListFileStatus scanDirectory(Path path, FileContext fc,
  PathFilter pathFilter) throws IOException {
path = fc.makeQualified(path);
ListFileStatus jhStatusList = new ArrayListFileStatus();
RemoteIteratorFileStatus fileStatusIter = fc.listStatus(path);
while (fileStatusIter.hasNext()) {
  FileStatus fileStatus = fileStatusIter.next();
  Path filePath = fileStatus.getPath();
  if (fileStatus.isFile()  pathFilter.accept(filePath)) {
jhStatusList.add(fileStatus);
  }
}
return jhStatusList;
  }

{noformat}




 JobHistoryServer (HistoryFileManager) needs more debug logs, fails to pick up 
 jobs with % characters in the name.
 -

 Key: MAPREDUCE-5902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: jay vyas
   Original Estimate: 1h
  Remaining Estimate: 1h

 1) JobHistoryServer sometimes skips over certain history files, and ignores 
 serving them as completed.
 2) In addition to skipping these files, the JobHistoryServer doesnt 
 effectively log which files are being skipped , and why.  
 So In addition to determining why certain types of files are skipped (file 
 name length doesnt appear to be the reason, rather, it appears to be that % 
 characters throw the JobHistoryServer filter off), we should log completed 
 .jhist  files which  are available in the mr-history/tmp directory, yet they 
 are skipped for some reason. 
 ** Regarding the actual bug : Skipping completed jhist files ** 
 We will need an author of the JobHistoryServer, I think, to chime in on what 
 types of paths for jobs are actually valid.  It appears that at least some 
 characters, if in a job name, will make the