from:"Jason Lowe \(JIRA\)"

[jira] [Updated] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used

2018-12-06 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7159:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.1
   3.3.0
   3.1.2
   Status: Resolved  (was: Patch Available)

Thanks to [~pbacsko] for the contribution and to [~wilfreds] for additional 
review!  I committed this to trunk, branch-3.2, and branch-3.1.


> FrameworkUploader: ensure proper permissions of generated framework tar.gz if 
> restrictive umask is used
> ---
>
> Key: MAPREDUCE-7159
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7159
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7159-001.patch, MAPREDUCE-7159-002.patch, 
> MAPREDUCE-7159-003.patch, MAPREDUCE-7159-004.patch, MAPREDUCE-7159-005.patch, 
> MAPREDUCE-7159-006.patch, MAPREDUCE-7159-007.patch
>
>
> Using certain umask values (like 027) makes files unreadable to "others". 
> This causes problems if the FrameworkUploader 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java)
>  is used - it's necessary that the compressed MR framework is readable by all 
> users, otherwise they won't be able to run MR jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used

2018-12-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712010#comment-16712010
 ] 

Jason Lowe commented on MAPREDUCE-7159:
---

Thanks for updating the patch!  +1 lgtm.  Committing this.

> FrameworkUploader: ensure proper permissions of generated framework tar.gz if 
> restrictive umask is used
> ---
>
> Key: MAPREDUCE-7159
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7159
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7159-001.patch, MAPREDUCE-7159-002.patch, 
> MAPREDUCE-7159-003.patch, MAPREDUCE-7159-004.patch, MAPREDUCE-7159-005.patch, 
> MAPREDUCE-7159-006.patch, MAPREDUCE-7159-007.patch
>
>
> Using certain umask values (like 027) makes files unreadable to "others". 
> This causes problems if the FrameworkUploader 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java)
>  is used - it's necessary that the compressed MR framework is readable by all 
> users, otherwise they won't be able to run MR jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used

2018-12-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711728#comment-16711728
 ] 

Jason Lowe commented on MAPREDUCE-7159:
---

Looks good overall, just a few nits.  The program will emit an error message, 
likely twice since it logs it at the ERROR level and prints it to stdout (not 
stderr?), yet it will return a successful exit code.  I think these messages 
should be warnings not errors if the program is going to be considered a 
success, or the program should return a non-zero exit code indicating there was 
an error.

Stack traces aren't the most end-user friendly things from CLI programs.  Do we 
really want to always print it?  Wondering if it should be debug-logged 
instead, since the separate message seems pretty specific.  I'm not sure the 
stacktrace is going to be that helpful in practice beyond what the message 
already conveys.


> FrameworkUploader: ensure proper permissions of generated framework tar.gz if 
> restrictive umask is used
> ---
>
> Key: MAPREDUCE-7159
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7159
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7159-001.patch, MAPREDUCE-7159-002.patch, 
> MAPREDUCE-7159-003.patch, MAPREDUCE-7159-004.patch, MAPREDUCE-7159-005.patch, 
> MAPREDUCE-7159-006.patch
>
>
> Using certain umask values (like 027) makes files unreadable to "others". 
> This causes problems if the FrameworkUploader 
> (https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-uploader/src/main/java/org/apache/hadoop/mapred/uploader/FrameworkUploader.java)
>  is used - it's necessary that the compressed MR framework is readable by all 
> users, otherwise they won't be able to run MR jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7167) Extra LF ("\n") pushed directly to storage

2018-12-01 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706018#comment-16706018
 ] 

Jason Lowe commented on MAPREDUCE-7167:
---

I also think this is not going to be an incompatible change.  If it were, I 
would have expected changes on the reader side to compensate for the updated 
input format.  JSON readers should be silently skipping extra whitespace.

> Extra LF ("\n") pushed directly to storage
> --
>
> Key: MAPREDUCE-7167
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7167
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Saurabh
>Assignee: Saurabh
>Priority: Major
> Attachments: image-2018-11-28-19-23-52-972.png, 
> image-2018-11-29-14-53-58-176.png, image-2018-11-29-14-54-28-254.png, 
> nremoved.txt, nremoved.txt, patch1128.patch, patch1128.patch, 
> patch1128trunk.patch, withn.txt, withn.txt
>
>
> JsonEncoder already adds the necessary newline after writing each object as 
> per [this| 
> [https://github.com/apache/avro/blob/39ec1a3f0addfce06869f705f7a17c03d538fe16/lang/java/avro/src/main/java/org/apache/avro/io/JsonEncoder.java#L77]
>  ] so this patch removes the "out.writeBytes("\n");". As the encoder is 
> buffered, the out.writeBytes can cause JSON errors in the output stream as it 
> directly writes to the output stream, hence it must be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-29 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703379#comment-16703379
 ] 

Jason Lowe commented on MAPREDUCE-7164:
---

This case occurred when the namenode was being hammered and was responding 
pretty slowly. A job unrelated to that namenode storm was using an output 
committer that had a very large number of output files per task. Each task was 
serially renaming outputs to the final output directory (i.e.: using commit 
algorithm v2).  The namenode was slow to respond, so the time it took to get 
through the serial list of rename operations exceeded the task timeout.  The AM 
killed the task because it was not reporting any progress during this long 
operation and assumed it was stuck.  This JIRA essentially updates the v2 
commit algorithm to match the [behavior of the Hadoop 1.x 
FileOutputCommitter|https://github.com/apache/hadoop/blob/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L193].

> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7164.001.patch, MAPREDUCE-7164.002.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-28 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7164:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.1
   3.3.0
   3.1.2
   3.0.4
   Status: Resolved  (was: Patch Available)

Thanks, [~kshukla]!  I committed this to trunk, branch-3.2, branch-3.1, and 
branch-3.0.  The patch does not apply to branch-2, so if you are interested in 
having the patch applied to 2.x as well feel free to reopen this JIRA to post a 
2.x patch.

> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: MAPREDUCE-7164.001.patch, MAPREDUCE-7164.002.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-28 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702377#comment-16702377
 ] 

Jason Lowe commented on MAPREDUCE-7164:
---

Thanks for updating the patch!  +1 lgtm.  Committing this.

> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: MAPREDUCE-7164.001.patch, MAPREDUCE-7164.002.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-27 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701077#comment-16701077
 ] 

Jason Lowe commented on MAPREDUCE-7164:
---

Thanks for the patch!  I think it would be fine to downcast as necessary, with 
{{instanceof(Progressable)}} checks as necessary, skipping the progress update 
if the context is not progressable.  That way if someone uses file output 
committer algorithm v1 which does _not_ have a progress indicator (since this 
occurs in the AM rather than in task attempts) it still does the right thing.  
Similarly if something ends up calling the JobContext form of the constructor 
but does pass a context that is Progressable then it also continues to do the 
right thing.  A simple utility function that takes the JobContext, does the 
instance check and calls progress if possible would make this a lot cleaner, 
since there's only one place where it would need to do the downcast.


> FileOutputCommitter does not report progress while merging paths.
> -
>
> Key: MAPREDUCE-7164
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.3, 2.8.5, 2.9.2
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: MAPREDUCE-7164.001.patch
>
>
> In cases where the rename and merge path logic takes more time than usual, 
> the committer does not report progress and can cause job failure. This 
> behavior was not present in Hadoop 1.x. This JIRA will fix it so that the old 
> behavior for 1.x is restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-11-07 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7148:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks to [~tiana528] for the contribution and to [~ste...@apache.org] and 
[~ozawa] for additional review!  I committed this to trunk.

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch, 
> MAPREDUCE-7148.009.patch, MAPREDUCE-7148.010.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-11-07 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678291#comment-16678291
 ] 

Jason Lowe commented on MAPREDUCE-7148:
---

Unit test failures are unrelated.  surefire is complaining because every forked 
JVM is immediately failing with the inability to load the main class.

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch, 
> MAPREDUCE-7148.009.patch, MAPREDUCE-7148.010.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-11-07 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678274#comment-16678274
 ] 

Jason Lowe commented on MAPREDUCE-7148:
---

Thanks for updating the patch!  +1 lgtm.  Committing this.

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch, 
> MAPREDUCE-7148.009.patch, MAPREDUCE-7148.010.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

2018-11-06 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7156:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.3
   3.2.1
   3.3.0
   2.8.6
   3.1.2
   3.0.4
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks, [~pbacsko]!  I committed this to trunk, branch-3.2, branch-3.1, 
branch-3.0, branch-2, branch-2.9, and branch-2.8.

> NullPointerException when reaching max shuffle connections
> --
>
> Key: MAPREDUCE-7156
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 2.8.6, 3.3.0, 3.2.1, 2.9.3
>
> Attachments: MAPREDUCE-7156-001.patch, MAPREDUCE-7156-002.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of 
> NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Skipping monitoring container container_e22_1531424278071_55040_01_002295 
> since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> at 
>

[jira] [Commented] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

2018-11-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677428#comment-16677428
 ] 

Jason Lowe commented on MAPREDUCE-7156:
---

According to some stuff I could scrape from the precommit workspace, it appears 
unit tests are failing to run with this error:
{noformat}
# Created at 2018-11-06T18:11:22.668
Error: Could not find or load main class 
org.apache.maven.surefire.booter.ForkedBooter

# Created at 2018-11-06T18:11:22.882
Error: Could not find or load main class 
org.apache.maven.surefire.booter.ForkedBooter

# Created at 2018-11-06T18:11:23.073
Error: Could not find or load main class 
org.apache.maven.surefire.booter.ForkedBooter
{noformat}

Not sure how that is happening on the Jenkins host.

All the shuffle unit tests pass for me locally with the latest patch applied.  
+1 for the latest patch, committing this.


> NullPointerException when reaching max shuffle connections
> --
>
> Key: MAPREDUCE-7156
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7156-001.patch, MAPREDUCE-7156-002.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of 
> NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Skipping monitoring container container_e22_1531424278071_55040_01_002295 
> since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than

[jira] [Commented] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

2018-11-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677036#comment-16677036
 ] 

Jason Lowe commented on MAPREDUCE-7156:
---

Thanks for updating the patch!  Weird, the unit tests ran zero tests then 
failed.  It looks like the surefire JVM died somehow before it ran any tests.  
Kicked off another precommit run on this to see if it was a hiccup or not.

> NullPointerException when reaching max shuffle connections
> --
>
> Key: MAPREDUCE-7156
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7156-001.patch, MAPREDUCE-7156-002.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of 
> NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Skipping monitoring container container_e22_1531424278071_55040_01_002295 
> since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> at 
> org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
> at 
>

[jira] [Commented] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections

2018-11-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16676898#comment-16676898
 ] 

Jason Lowe commented on MAPREDUCE-7156:
---

Thanks for the report and patch!  Curious, is there a reason not to just call 
the super method at the start of the method rather than separately in both code 
paths?


> NullPointerException when reaching max shuffle connections
> --
>
> Key: MAPREDUCE-7156
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.9.1, 3.1.1
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7156-001.patch
>
>
>  When you hit the max number of shuffle connections, you can get a lot of 
> NullPointerExceptions from Netty:
> {noformat}
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,311 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,312 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,316 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,317 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0x71187405, /10.17.226.11:44330 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,329 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Skipping monitoring container container_e22_1531424278071_55040_01_002295 
> since CPU usage is not yet available.
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,340 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> 2018-07-17 10:47:36,349 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error [id: 0xea8afd26, /10.17.202.18:43810 => /10.17.202.21:13562] 
> EXCEPTION: java.lang.NullPointerException
> 2018-07-17 10:47:36,361 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,390 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 10:47:36,395 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> {noformat}
> {noformat}
> 2018-07-17 13:58:28,263 INFO org.apache.hadoop.mapred.ShuffleHandler: Current 
> number of shuffle connections (360) is greater than or equal to the max 
> allowed shuffle connections (360)
> 2018-07-17 13:58:28,264 ERROR org.apache.hadoop.mapred.ShuffleHandler: 
> Shuffle error:
> java.lang.NullPointerException
> at 
> org.jboss.netty.handler.timeout.IdleStateHandler.writeComplete(IdleStateHandler.java:302)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:73)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at 
>

[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-11-06 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-7148:
-

Assignee: Wang Yan  (was: Jason Lowe)

Thanks for updating the patch! Apologies for the delay in re-review, been very 
busy lately.

Just one last nit with the rework: there's now a lot of redundant code in 
reportError. All three code paths do the same thing with only a boolean 
difference for whether the job should fail. It would be easier to read and 
maintain if the number of code paths were reduced by computing whether or not 
the job should fast-fail in a boolean and then unconditionally call 
umbilical.fatalError, e.g.:
{code:java}
  boolean fastFailJob = false;
  [...]
  if (hasClusterStorageCapacityExceededException) {
[...]
if (killJobWhenExceedClusterStorageCapacity) {
   LOG.error(...)
   fastFailJob = true;
}
  }
  umbilical.fatalError(..., fastFailJob);
{code}

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch, 
> MAPREDUCE-7148.009.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Assigned] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-30 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-7148:
-

Assignee: Wang Yan  (was: Jason Lowe)

> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-30 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668838#comment-16668838
 ] 

Jason Lowe commented on MAPREDUCE-7148:
---

Thanks for updating the patch!  Looks good overall, just some cleanup nits.

I'm personally not a fan of LimitedPrivate.  I think it has limited (ha!) 
utility in practice.  For example, what is Tez supposed to do if they want to 
implement the same feature?  Are they not allowed to do so until LimitedPrivate 
is removed from the class?  If so then we need to file a followup JIRA to 
remember to revisit this annotation.  I wonder if marking it Unstable is more 
useful than LimitedPrivate in practice as a "buyer beware" for those that want 
to use it downstream and are willing to risk a future incompatibility on a 
later version of Hadoop to use it.  Not a must-change, but I'm curious what 
[~ste...@apache.org] thinks about it especially in the very likely scenario 
that Tez wants to replicate this feature.

reportError should not lookup the conf value until it's necessary, i.e.: the 
exception is known to be the relevant type.

The difference between WARN and ERROR for the log level is subtle and arguably 
an odd choice to use WARN.  This error is fatal to the task, i.e.: the entity 
emitting the log.  IMHO logging at the error level is the bare minimum level 
since this exception is terminating the task attempt, the entity emitting the 
log message.  I'd much rather see the log mentioning that it is requesting the 
job to be terminated rather than expecting users to notice it's a WARN vs. an 
ERROR to know the difference.

Nit: A lot of the tests are copy-n-paste.  A private method that takes the 
exception to throw and what to expect for the fail job flag would help.


> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch, MAPREDUCE-7148.005.patch, 
> MAPREDUCE-7148.006.patch, MAPREDUCE-7148.007.patch, MAPREDUCE-7148.008.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks

2018-10-29 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667178#comment-16667178
 ] 

Jason Lowe commented on MAPREDUCE-7152:
---

Can this be solved by simply changing:
{noformat}
$HADOOP_COMMON_HOME/lib/native
{noformat}
to
{noformat}
{{HADOOP_COMMON_HOME}}/lib/native
{noformat}
so the expansion of HADOOP_COMMON_HOME is not done by the job client but by the 
NM when the container is run?

> LD_LIBRARY_PATH is always passed from MR AM to tasks
> 
>
> Key: MAPREDUCE-7152
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7152-NMAdminEnvPOC_POC01.patch, 
> MAPREDUCE-7152-lazyEval_POC01.patch
>
>
> {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default 
> in Hadoop (as part of {{mapreduce.admin.user.env}} and 
> {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable 
> from AM container to task containers in the container launch context.
> In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, 
> tasks will fail to load native library. A reliable way to fix this is to add 
> {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead.
> Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on 
> the NM side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7155) TestHSAdminServer is failing

2018-10-22 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659744#comment-16659744
 ] 

Jason Lowe commented on MAPREDUCE-7155:
---

Per my comment on HADOOP-15836 I don't think we should be proliferating 
TreeSets here.

> TestHSAdminServer is failing
> 
>
> Key: MAPREDUCE-7155
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7155
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Jason Lowe
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: MAPREDUCE-7155.1.patch
>
>
> After HADOOP-15836 TestHSAdminServer has been failing consistently.  Sample 
> stacktraces to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7155) TestHSAdminServer is failing

2018-10-22 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659083#comment-16659083
 ] 

Jason Lowe commented on MAPREDUCE-7155:
---

{noformat}
[INFO] Running org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer
[ERROR] Tests run: 16, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.741 
s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer
[ERROR] 
testRefreshSuperUserGroups[0](org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer)
  Time elapsed: 0.057 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer.testRefreshSuperUserGroups(TestHSAdminServer.java:208)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runners.Suite.runChild(Suite.java:127)
at org.junit.runners.Suite.runChild(Suite.java:26)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)

[ERROR] 
testRefreshSuperUserGroups[1](org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer)
  Time elapsed: 0.046 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.v2.hs.server.TestHSAdminServer.testRefreshSuperUserGroups(TestHSAdminServer.java:208)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at

[jira] [Created] (MAPREDUCE-7155) TestHSAdminServer is failing

2018-10-22 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-7155:
-

 Summary: TestHSAdminServer is failing
 Key: MAPREDUCE-7155
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7155
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Jason Lowe


After HADOOP-15836 TestHSAdminServer has been failing consistently.  Sample 
stacktraces to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7149) javadocs for FileInputFormat and OutputFormat to mention DT collection

2018-10-11 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7149:
--
Fix Version/s: 3.3.0

> javadocs for FileInputFormat and OutputFormat to mention DT collection
> --
>
> Key: MAPREDUCE-7149
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7149
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7149-001.patch
>
>
> the fact that DTs are collected for a job in 
> {{FileInputFormat.listStatus(JobConf job)}}  and 
> {{OutputFormat.checkOutputSpecs}} is not something mentioned in the javadocs, 
> or that obvious when you look @ the API.
> Add a sentence to the javadocs of the relevant methods to make clear "the job 
> you pass in is altered; anyone subclassing needs call their superclass or do 
> something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7149) javadocs for FileInputFormat and OutputFormat to mention DT collection

2018-10-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646697#comment-16646697
 ] 

Jason Lowe commented on MAPREDUCE-7149:
---

Thanks for the patch!  Other than the whitespace nit, +1 lgtm.

> javadocs for FileInputFormat and OutputFormat to mention DT collection
> --
>
> Key: MAPREDUCE-7149
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7149
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: MAPREDUCE-7149-001.patch
>
>
> the fact that DTs are collected for a job in 
> {{FileInputFormat.listStatus(JobConf job)}}  and 
> {{OutputFormat.checkOutputSpecs}} is not something mentioned in the javadocs, 
> or that obvious when you look @ the API.
> Add a sentence to the javadocs of the relevant methods to make clear "the job 
> you pass in is altered; anyone subclassing needs call their superclass or do 
> something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-10 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645034#comment-16645034
 ] 

Jason Lowe commented on MAPREDUCE-7148:
---

Unrecoverable works as long as the reader understands the "recovery" part is 
referring to tasks and not the error on the filesystem being reported.  Maybe 
ClusterStorageCapacityExceededException?  I'm terrible with names -- as long as 
everyone's on the same page as to what it really means I'm OK with it.  Or as 
you suggest, a boolean predicate like isNodeLocal() can work too.  The key is 
having someone who implements a new FileSystem easily knowing when it's 
appropriate to throw an exception of this type and when it's not.


> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7148) Fast fail jobs when exceeds dfs quota limitation

2018-10-09 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644118#comment-16644118
 ] 

Jason Lowe commented on MAPREDUCE-7148:
---

Thanks for the patch!  Seems like a reasonable request and approach.

Not sure about the StorageCapacityExceededException name.  It implies any kind 
of storage capacity error could fall underneath it, like full local disk or a 
local disk quota which _is_ something we would want to retry.  Not an issue for 
the current patch, more of a maintenance concern if other filesystems decide to 
start using it as part of their exception repertoire.  Depending upon the type 
of filesystem it could or could not be appropriate for this feature.  I'm not 
sure offhand what a better name would be, just pointing out the potential for 
confusion there.

There really should be unit tests for this given it's a new feature to verify 
the master enable works to disable it and the feature works when the master 
enable is on.


> Fast fail jobs when exceeds dfs quota limitation
> 
>
> Key: MAPREDUCE-7148
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7148
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.0, 2.8.0, 2.9.0
> Environment: hadoop 2.7.3
>Reporter: Wang Yan
>Assignee: Wang Yan
>Priority: Major
> Attachments: MAPREDUCE-7148.001.patch, MAPREDUCE-7148.002.patch, 
> MAPREDUCE-7148.003.patch, MAPREDUCE-7148.004.patch
>
>
> We are running hive jobs with a DFS quota limitation per job(3TB). If a job 
> hits DFS quota limitation, the task that hit it will fail and there will be a 
> few task reties before the job actually fails. The retry is not very helpful 
> because the job will always fail anyway. In some worse cases, we have a job 
> which has a single reduce task writing more than 3TB to HDFS over 20 hours, 
> the reduce task exceeds the quota limitation and retries 4 times until the 
> job fails in the end thus consuming a lot of unnecessary resource. This 
> ticket aims at providing the feature to let a job fail fast when it writes 
> too much data to the DFS and exceeds the DFS quota limitation. The fast fail 
> feature is introduced in MAPREDUCE-7022 and MAPREDUCE-6489 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

2018-10-09 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7130:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks, [~pbacsko]!  I committed this to trunk.

> Rumen crashes trying to handle MRAppMaster recovery events
> --
>
> Key: MAPREDUCE-7130
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Jonathan Bender
>Assignee: Peter Bacsko
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7130-001.patch, MAPREDUCE-7130-002.patch
>
>
> In the event of an MRAppMaster recovery, the Job History file gets an event 
> of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job
>  commit succeeded in a prior MRAppMaster attempt before it crashed. 
> Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a 
> JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history 
> file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

2018-10-09 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643914#comment-16643914
 ] 

Jason Lowe commented on MAPREDUCE-7130:
---

Thanks for the patch!  +1 lgtm.  Committing this.


> Rumen crashes trying to handle MRAppMaster recovery events
> --
>
> Key: MAPREDUCE-7130
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Jonathan Bender
>Assignee: Peter Bacsko
>Priority: Minor
> Attachments: MAPREDUCE-7130-001.patch, MAPREDUCE-7130-002.patch
>
>
> In the event of an MRAppMaster recovery, the Job History file gets an event 
> of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job
>  commit succeeded in a prior MRAppMaster attempt before it crashed. 
> Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a 
> JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history 
> file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

2018-10-08 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641819#comment-16641819
 ] 

Jason Lowe commented on MAPREDUCE-7130:
---

+1.  This seems the most likely to fix the issue without introducing a lot more 
risk/issues.


> Rumen crashes trying to handle MRAppMaster recovery events
> --
>
> Key: MAPREDUCE-7130
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Jonathan Bender
>Priority: Minor
>
> In the event of an MRAppMaster recovery, the Job History file gets an event 
> of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job
>  commit succeeded in a prior MRAppMaster attempt before it crashed. 
> Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a 
> JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history 
> file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7130) Rumen crashes trying to handle MRAppMaster recovery events

2018-10-01 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634684#comment-16634684
 ] 

Jason Lowe commented on MAPREDUCE-7130:
---

I'm not a Rumen expert and I haven't dug into the Rumen side for this yet.  I'm 
wondering why Rumen is trying to apply pre-Hadoop 0.21 enums to a version well 
past Hadoop 0.21.  Maybe the "Pre21" is a misnomer?

It looks like this may have been triggered by MAPREDUCE-5795.  Before that it 
always emitted "KILLED" as the status of an unsuccessful job completion, but 
that code changed it to emit SUCCEEDED in some cases.  I agree that SUCCEEDED 
has been used for a job status for a really long time now, since before 2011.  
So my initial reaction is Rumen shouldn't be applying a pre-Hadoop 21 enum to 
this field at all, certainly not for MRAppMaster recovery files which didn't 
even exist pre-Hadoop 0.21.  It seems like it should be using the JobStatus 
enum for this instead.  Certainly adding SUCCEEDED to the enum or something 
similar would also work, but again I'm wondering why it's using a custom enum 
set here.


> Rumen crashes trying to handle MRAppMaster recovery events
> --
>
> Key: MAPREDUCE-7130
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7130
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Jonathan Bender
>Priority: Minor
>
> In the event of an MRAppMaster recovery, the Job History file gets an event 
> of the following form:
> {code:json}
> {"type":"JOB_KILLED","event":\{"org.apache.hadoop.mapreduce.jobhistory.JobUnsuccessfulCompletion":{"jobid":"job_1532048817013_","finishTime":1534521962641,"finishedMaps":0,"finishedReduces":0,"jobStatus":"SUCCEEDED","diagnostics":{"string":"Job
>  commit succeeded in a prior MRAppMaster attempt before it crashed. 
> Recovering."},"failedMaps":0,"failedReduces":0,"killedMaps":0,"killedReduces":0}}}
> {code}
> The issue seems to be around the SUCCEEDED job status for a 
> JobUnsuccessfulCompletion:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/JobBuilder.java#L609
> Which fails to find the enum here:
> https://github.com/apache/hadoop/blob/e0f6ffdbad6f43fd43ec57fb68ebf5275b8b9ba0/hadoop-tools/hadoop-rumen/src/main/java/org/apache/hadoop/tools/rumen/Pre21JobHistoryConstants.java#L50
> I'm not sure if this is an error with the Rumen parser or if the job history 
> file is getting into an invalid state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Moved] (MAPREDUCE-7147) Review of LocalContainerAllocator

2018-09-28 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-8831 to MAPREDUCE-7147:
-

Affects Version/s: (was: 3.2.0)
   3.2.0
  Component/s: (was: applications)
   mr-am
  Key: MAPREDUCE-7147  (was: YARN-8831)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Review of LocalContainerAllocator
> -
>
> Key: MAPREDUCE-7147
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7147
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-8831.1.patch
>
>
> Some trivial cleanup of class {{LocalContainerAllocator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Moved] (MAPREDUCE-7146) Review of RMCommunicator Class

2018-09-28 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-8832 to MAPREDUCE-7146:
-

Affects Version/s: (was: 3.2.0)
   3.2.0
  Component/s: (was: applications)
   mr-am
  Key: MAPREDUCE-7146  (was: YARN-8832)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Review of RMCommunicator Class
> --
>
> Key: MAPREDUCE-7146
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7146
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-88321.patch
>
>
> Various improvements to the {{RMCommunicator}} class.
>  
>  * Use SLF4J parameterized logging
>  * Use switch statement instead of {{if}}-{{else statements}}
>  * Remove anti-pattern of "log and throw" (just throw)
>  * Use a flag to stop thread instead of an interrupt (it may be interrupting 
> the heartbeat code and not the thread loop)
>  * The main thread loops performs loops on the heartbeat callback queue until 
> the queue is empty.  It's technically possible that other threads could 
> constantly put new callbacks into the queue and therefore the main thread 
> never progresses past the callbacks.  Put a cap on the number of callbacks 
> that will be processed in any iteration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7138) ThrottledContainerAllocator in MRAppBenchmark should implement RMHeartbeatHandler

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7138:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.6
   3.1.2
   2.9.2
   3.0.4
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk, branch-3.1, branch-3.0, 
branch-2, branch-2.9, and branch-2.8.

> ThrottledContainerAllocator in MRAppBenchmark should implement 
> RMHeartbeatHandler
> -
>
> Key: MAPREDUCE-7138
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7138
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.9.2, 3.1.2, 2.8.6
>
> Attachments: MAPREDUCE-7138.001.patch, MAPREDUCE-7138.002.patch, 
> MAPREDUCE-7138.003.patch
>
>
> MRAppBenchmark#benchmark2 test fails with the following exception:
> {noformat}
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark$ThrottledMRApp$ThrottledContainerAllocator
>  cannot be cast to org.apache.hadoop.mapreduce.v2.app.rm.RMHeartbeatHandler
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getRMHeartbeatHandler(MRAppMaster.java:718)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRApp.createCommitterEventHandler(MRApp.java:665)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:391)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.serviceInit(MRApp.java:266)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:294)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:279)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.run(MRAppBenchmark.java:69)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.benchmark2(MRAppBenchmark.java:268)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> {noformat}
> since ThrottledContainerAllocator doesn't implement RMHeartbeatHandler 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7137) MRAppBenchmark.benchmark1() fails with NullPointerException

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7137:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.6
   3.1.2
   2.9.2
   3.0.4
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk, branch-3.1, branch-3.0, 
branch-2, branch-2.9, and branch-2.8.

> MRAppBenchmark.benchmark1() fails with NullPointerException
> ---
>
> Key: MAPREDUCE-7137
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7137
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.0.4, 2.9.2, 3.1.2, 2.8.6
>
> Attachments: MAPREDUCE-7137.001.patch
>
>
> MRAppBenchmark.benchmark1() fails with NullPointerException:
> 1. We do not set any queue for this test. As the result we got the following 
> exception:
> {noformat}
> 2018-09-10 17:04:23,486 ERROR [Thread-0] rm.RMCommunicator 
> (RMCommunicator.java:register(177)) - Exception while registering
> java.lang.NullPointerException
> at org.apache.avro.util.Utf8$2.toUtf8(Utf8.java:123)
> at org.apache.avro.util.Utf8.getBytesFor(Utf8.java:172)
> at org.apache.avro.util.Utf8.(Utf8.java:39)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobQueueChangeEvent.(JobQueueChangeEvent.java:35)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.setQueueName(JobImpl.java:1167)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:174)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1293)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301)
> at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.run(MRAppBenchmark.java:72)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.benchmark1(MRAppBenchmark.java:194)
> {noformat}
> 2. We override createSchedulerProxy method and do not set application 
> priority that was added later by MAPREDUCE-6515. We got the following error:
> {noformat}
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In both cases, the job never will be run and the test stuck and will not be 
> finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7138) ThrottledContainerAllocator in MRAppBenchmark should implement RMHeartbeatHandler

2018-09-18 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619790#comment-16619790
 ] 

Jason Lowe commented on MAPREDUCE-7138:
---

Thanks for the report and patch!  +1 lgtm.  Committing this.

> ThrottledContainerAllocator in MRAppBenchmark should implement 
> RMHeartbeatHandler
> -
>
> Key: MAPREDUCE-7138
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7138
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7138.001.patch, MAPREDUCE-7138.002.patch, 
> MAPREDUCE-7138.003.patch
>
>
> MRAppBenchmark#benchmark2 test fails with the following exception:
> {noformat}
> java.lang.ClassCastException: 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark$ThrottledMRApp$ThrottledContainerAllocator
>  cannot be cast to org.apache.hadoop.mapreduce.v2.app.rm.RMHeartbeatHandler
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.getRMHeartbeatHandler(MRAppMaster.java:718)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRApp.createCommitterEventHandler(MRApp.java:665)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:391)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.serviceInit(MRApp.java:266)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:294)
>   at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:279)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.run(MRAppBenchmark.java:69)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.benchmark2(MRAppBenchmark.java:268)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
> {noformat}
> since ThrottledContainerAllocator doesn't implement RMHeartbeatHandler 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7137) MRAppBenchmark.benchmark1() fails with NullPointerException

2018-09-18 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619783#comment-16619783
 ] 

Jason Lowe commented on MAPREDUCE-7137:
---

Thanks for the report and patch!

+1 for the patch, although I noted that this alone doesn't allow the unit test 
to pass.  As soon as this is fixed it leads to the problem reported in 
MAPREDUCE-7138.  In the future it would be simpler and less overhead to fix all 
the issues with the test in one patch, especially since the changes are so 
small.


> MRAppBenchmark.benchmark1() fails with NullPointerException
> ---
>
> Key: MAPREDUCE-7137
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7137
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7137.001.patch
>
>
> MRAppBenchmark.benchmark1() fails with NullPointerException:
> 1. We do not set any queue for this test. As the result we got the following 
> exception:
> {noformat}
> 2018-09-10 17:04:23,486 ERROR [Thread-0] rm.RMCommunicator 
> (RMCommunicator.java:register(177)) - Exception while registering
> java.lang.NullPointerException
> at org.apache.avro.util.Utf8$2.toUtf8(Utf8.java:123)
> at org.apache.avro.util.Utf8.getBytesFor(Utf8.java:172)
> at org.apache.avro.util.Utf8.(Utf8.java:39)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobQueueChangeEvent.(JobQueueChangeEvent.java:35)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.setQueueName(JobImpl.java:1167)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:174)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
> at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1293)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301)
> at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.run(MRAppBenchmark.java:72)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppBenchmark.benchmark1(MRAppBenchmark.java:194)
> {noformat}
> 2. We override createSchedulerProxy method and do not set application 
> priority that was added later by MAPREDUCE-6515. We got the following error:
> {noformat}
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In both cases, the job never will be run and the test stuck and will not be 
> finished.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619457#comment-16619457
 ] 

Jason Lowe commented on MAPREDUCE-3801:
---

No unit test since this is fixing the code to get an existing flaky test to 
pass.

> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-3801.001.patch, 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-3801:
--
Attachment: MAPREDUCE-3801.001.patch

> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-3801.001.patch, 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-3801:
--
Target Version/s: 2.10.0, 3.2.0, 3.0.4, 2.9.2, 3.1.2, 2.8.6
  Status: Patch Available  (was: Open)

> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-3801.001.patch, 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Assigned] (MAPREDUCE-3801) org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator fails intermittently

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-3801:
-

Assignee: Jason Lowe

Ran into this again, as it started failing semi-reliably on our test machines.

Not sure this is the complete issue, but certainly part of the problem is the 
speculator.eventQueueEmpty() call in the test to wait for any pending 
speculations.  That method is checking a queue _that is never used_.  The queue 
is therefore always empty, so the test never waits.  The method should be 
waiting on the scanControl queue rather than the unused eventQueue.


> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator
>  fails intermittently
> --
>
> Key: MAPREDUCE-3801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3801
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Jason Lowe
>Priority: Major
> Attachments: 
> TEST-org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.xml, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators-output.txt, 
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.txt
>
>
> org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators,testExponentialEstimator
>  fails intermittently



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7140) Refactoring TaskAttemptInfo to separate Map and Reduce tasks

2018-09-14 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7140:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk.

> Refactoring TaskAttemptInfo to separate Map and Reduce tasks
> 
>
> Key: MAPREDUCE-7140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7140
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7140.001.patch
>
>
> Filed as a separate improvement per conversation in MAPREDUCE-7133.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7140) Refactoring TaskAttemptInfo to separate Map and Reduce tasks

2018-09-14 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615317#comment-16615317
 ] 

Jason Lowe commented on MAPREDUCE-7140:
---

Thanks for the patch!  +1 lgtm.  Committing this.

> Refactoring TaskAttemptInfo to separate Map and Reduce tasks
> 
>
> Key: MAPREDUCE-7140
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7140
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7140.001.patch
>
>
> Filed as a separate improvement per conversation in MAPREDUCE-7133.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-6440) Duplicate Key in Json Output for Job details

2018-09-13 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved MAPREDUCE-6440.
---
  Resolution: Duplicate
Target Version/s:   (was: )

This has been fixed by MAPREDUCE-7133.

> Duplicate Key in Json Output for Job details
> 
>
> Key: MAPREDUCE-6440
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6440
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Anushri
>Priority: Minor
>
> Duplicate key in Json Output for Job details for the url : 
> http://:/ws/v1/history/mapreduce/jobs/job_id/tasks/task_id/attempts
> If the task type is "REDUCE" the json output for this url contains duplicate 
> key for "type".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data

2018-09-13 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7133:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.6
   3.1.2
   2.9.2
   3.0.4
   2.7.8
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk, branch-3.1, branch-3.0, 
branch-2, branch-2.9, branch-2.8, and branch-2.7.

> History Server task attempts REST API returns invalid data
> --
>
> Key: MAPREDUCE-7133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.7.8, 3.0.4, 2.9.2, 3.1.2, 2.8.6
>
> Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, 
> MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch, MAPREDUCE-7133.005.patch
>
>
> When we send a request to History Server with headers : Accept: 
> application/json 
> [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts]
>  
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_03_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_06",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> 
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">153537298463815353729861491511100.0attempt_1535363926925_0040_r_03_0/default-rackSUCCEEDEDreduce
>  > 
> reduce[node2.cluster.com:8044|http://node2.cluster.com:8044]REDUCEcontainer_e01_1535363926925_0040_01_061535372986056153537298607514181974
> 
> {code}
> Take a look at the following string:
> {code:java}
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated 
> field if we use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" 
> attribute.
> {code:java}
> 
> 
> 1535370756528
> 1535370760318
> 3790
> 100.0
> attempt_1535363926925_0029_m_01_0
> /default-rack
> SUCCEEDED
> map > sort
> [node2.cluster.com:8044|http://node2.cluster.com:8044]
> 
> MAP
> container_e01_1535363926925_0029_01_03
> 
> 
> {code}
> This happens since we have two different hierarchical classes for MAP 
> ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks 
> (map and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do 
> not have any information about ReduceTaskAttemptInfo type as we store all 
> tasks in ArrayList. 
> During marshaling we see that actual type of task ReduceTaskAttemptInfo 
> instead of TaskAttemptsInfo and add meta information for this. That's why we 
> get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we 
> use 
> org.codehaus.jettison.json.JSONObject library which overrides duplicated 
> fields. Even when we use Postman to do request we get valid JSON. Only when 
> we change represent type to Raw we can notice this issue. Also, we able to 
> reproduce this bug by using "org.json:json" lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new 
> InputStreamReader(connection.getInputStream() ) );
>  String inputLine;
>  StringBuilder response = new StringBuilder();
> while ( (inputLine = inReader.readLine()) != null ) {
>  response.append(inputLine);
>  }
> inReader.close();
> JSONObject o = new JSONObject(response.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data

2018-09-13 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613971#comment-16613971
 ] 

Jason Lowe commented on MAPREDUCE-7133:
---

Thanks for updating the patch!  +1 lgtm.  Committing this.

> History Server task attempts REST API returns invalid data
> --
>
> Key: MAPREDUCE-7133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, 
> MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch, MAPREDUCE-7133.005.patch
>
>
> When we send a request to History Server with headers : Accept: 
> application/json 
> [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts]
>  
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_03_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_06",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> 
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">153537298463815353729861491511100.0attempt_1535363926925_0040_r_03_0/default-rackSUCCEEDEDreduce
>  > 
> reduce[node2.cluster.com:8044|http://node2.cluster.com:8044]REDUCEcontainer_e01_1535363926925_0040_01_061535372986056153537298607514181974
> 
> {code}
> Take a look at the following string:
> {code:java}
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated 
> field if we use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" 
> attribute.
> {code:java}
> 
> 
> 1535370756528
> 1535370760318
> 3790
> 100.0
> attempt_1535363926925_0029_m_01_0
> /default-rack
> SUCCEEDED
> map > sort
> [node2.cluster.com:8044|http://node2.cluster.com:8044]
> 
> MAP
> container_e01_1535363926925_0029_01_03
> 
> 
> {code}
> This happens since we have two different hierarchical classes for MAP 
> ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks 
> (map and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do 
> not have any information about ReduceTaskAttemptInfo type as we store all 
> tasks in ArrayList. 
> During marshaling we see that actual type of task ReduceTaskAttemptInfo 
> instead of TaskAttemptsInfo and add meta information for this. That's why we 
> get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we 
> use 
> org.codehaus.jettison.json.JSONObject library which overrides duplicated 
> fields. Even when we use Postman to do request we get valid JSON. Only when 
> we change represent type to Raw we can notice this issue. Also, we able to 
> reproduce this bug by using "org.json:json" lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new 
> InputStreamReader(connection.getInputStream() ) );
>  String inputLine;
>  StringBuilder response = new StringBuilder();
> while ( (inputLine = inReader.readLine()) != null ) {
>  response.append(inputLine);
>  }
> inReader.close();
> JSONObject o = new JSONObject(response.toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data

2018-09-12 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612812#comment-16612812
 ] 

Jason Lowe commented on MAPREDUCE-7133:
---

The problem with doing too much code cleanup in a fix that may need to be 
backported to many older releases is that it unnecessarily complicates and adds 
risk to the backports.  If this applies cleanly all the way back to 2.7.8 then 
it may not be a big deal in this particular case, but in general it's not a 
good idea to do a lot of code refactoring that's not required to fix the 
problem.  Because of the unnecessary refactoring, this patch changes 13 files 
when the fix only requires modification to 3.  If the patch applies cleanly to 
all places it should go then I'm OK with leaving the refactoring in, otherwise 
IMHO the cleanup should be filed as a separate improvement JIRA that goes into 
trunk while the targeted fix goes into all the various active branches that 
need it.

I'm fine with leaving TaskAttemptInfo an abstract class.

bq. I agree that using XMLElementRef annotations can be unobvious.

To me that statement warrants the comment, and adding a comment also aligns 
with the HowToContribute wiki which mentions, "comment code whose function or 
rationale is not obvious".  Note that the comment doesn't have to be a 
treatise, rather just a one-liner why it's used instead of the typical FIELD 
annotations used elsewhere.  If it's too hard to concisely explain then we can 
reference this analysis here, e.g.: "Use of XMLElementRef instead of 
XmlAccessorType is critical, see 
https://issues.apache.org/jira/browse/MAPREDUCE-7133;  A small comment next to 
the code in question makes it much easier for someone new to the code to come 
up to speed than requiring them to dig through the project's version control.


> History Server task attempts REST API returns invalid data
> --
>
> Key: MAPREDUCE-7133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, 
> MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch
>
>
> When we send a request to History Server with headers : Accept: 
> application/json 
> [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts]
>  
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_03_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_06",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> 
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">153537298463815353729861491511100.0attempt_1535363926925_0040_r_03_0/default-rackSUCCEEDEDreduce
>  > 
> reduce[node2.cluster.com:8044|http://node2.cluster.com:8044]REDUCEcontainer_e01_1535363926925_0040_01_061535372986056153537298607514181974
> 
> {code}
> Take a look at the following string:
> {code:java}
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated 
> field if we use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" 
> attribute.
> {code:java}
> 
> 
> 1535370756528
> 1535370760318
> 3790
> 100.0
> attempt_1535363926925_0029_m_01_0
> /default-rack
> SUCCEEDED
> map > sort
> [node2.cluster.com:8044|http://node2.cluster.com:8044]
> 
> MAP
> container_e01_1535363926925_0029_01_03
> 
> 
> {code}
> This happens since we have two different hierarchical classes for MAP 
> ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks 
> (map and reduce) by

[jira] [Commented] (MAPREDUCE-7133) History Server task attempts REST API returns invalid data

2018-09-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611219#comment-16611219
 ] 

Jason Lowe commented on MAPREDUCE-7133:
---

Thanks for the nice analysis and patch!

TaskAttemptInfo is now an abstract class with no abstract methods?

I'm not sure the MapTaskAttemptInfo changes are really necessary here, and the 
patch is much smaller without them.  The crux of the fix appears to be the 
switch to XMLElementRef in TaskAttemptsInfo more than anything else.  If I boil 
down the patch to just the TaskAttemptsInfo and unit test changes, the unit 
test still passes.  Adding this class might be nice from a code readability 
standpoint, but I'm not seeing why it's necessary for the fix.

Speaking of the XMLElementRef, there should be a comment explaining why it's 
very important to use that rather than FIELD accessor type.  Otherwise someone 
could be tempted to change it back later and not understand why the unit test 
cares whether attributes are present or not.


> History Server task attempts REST API returns invalid data
> --
>
> Key: MAPREDUCE-7133
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7133
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Major
> Attachments: MAPREDUCE-7133.001.patch, MAPREDUCE-7133.002.patch, 
> MAPREDUCE-7133.003.patch, MAPREDUCE-7133.004.patch
>
>
> When we send a request to History Server with headers : Accept: 
> application/json 
> [https://nodename:19888/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts|https://192.168.121.199:19890/ws/v1/history/mapreduce/jobs/job_1535363926925_0040/tasks/task_1535363926925_0040_r_03/attempts]
>  
> we get the following JSON:
> {code:java}
> {
> "taskAttempts": {
> "taskAttempt": [{
> "type": "reduceTaskAttemptInfo",
> "startTime": 1535372984638,
> "finishTime": 1535372986149,
> "elapsedTime": 1511,
> "progress": 100.0,
> "id": "attempt_1535363926925_0040_r_03_0",
> "rack": "/default-rack",
> "state": "SUCCEEDED",
> "status": "reduce > reduce",
> "nodeHttpAddress": "node2.cluster.com:8044",
> "diagnostics": "",
> "type": "REDUCE",
> "assignedContainerId": "container_e01_1535363926925_0040_01_06",
> "shuffleFinishTime": 1535372986056,
> "mergeFinishTime": 1535372986075,
> "elapsedShuffleTime": 1418,
> "elapsedMergeTime": 19,
> "elapsedReduceTime": 74
> }]
> }
> }
> {code}
> As you can see "type" property has duplicates:
> "type": "reduceTaskAttemptInfo"
> "type": "REDUCE"
> It's lead to an error during parsing response body as JSON is not valid.
> When we use application/xml we get the following response:
> {code:java}
> 
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">153537298463815353729861491511100.0attempt_1535363926925_0040_r_03_0/default-rackSUCCEEDEDreduce
>  > 
> reduce[node2.cluster.com:8044|http://node2.cluster.com:8044]REDUCEcontainer_e01_1535363926925_0040_01_061535372986056153537298607514181974
> 
> {code}
> Take a look at the following string:
> {code:java}
> http://www.w3.org/2001/XMLSchema-instance]" 
> xsi:type="reduceTaskAttemptInfo">
> {code}
> We got "xsi:type" attribute which incorectly marshall later to duplicated 
> field if we use JSON format.
> It acceptable only to REDUCE task. For MAP task we get xml without "xsi:type" 
> attribute.
> {code:java}
> 
> 
> 1535370756528
> 1535370760318
> 3790
> 100.0
> attempt_1535363926925_0029_m_01_0
> /default-rack
> SUCCEEDED
> map > sort
> [node2.cluster.com:8044|http://node2.cluster.com:8044]
> 
> MAP
> container_e01_1535363926925_0029_01_03
> 
> 
> {code}
> This happens since we have two different hierarchical classes for MAP 
> ->TaskAttemptInfo and REDUCE- > ReduceTaskAttemptInfo tasks.
> ReduceTaskAttemptInfo extends TaskAttemptInfo, later we marshal all tasks 
> (map and reduce) by TaskAttemptsInfo.getTaskAttempt(). In this place, we do 
> not have any information about ReduceTaskAttemptInfo type as we store all 
> tasks in ArrayList. 
> During marshaling we see that actual type of task ReduceTaskAttemptInfo 
> instead of TaskAttemptsInfo and add meta information for this. That's why we 
> get duplicated fields.
> Unfortunately we do not catch it before in TestHsWebServicesAttempts since we 
> use 
> org.codehaus.jettison.json.JSONObject library which overrides duplicated 
> fields. Even when we use Postman to do request we get valid JSON. Only when 
> we change represent type to Raw we can notice this issue. Also, we able to 
> reproduce this bug by using "org.json:json" lib:
> Something like this:
> {code:java}
> BufferedReader inReader = new BufferedReader( new 
>

[jira] [Commented] (MAPREDUCE-7136) TestMRAppMetrics should shutdown DefaultMetricsSystem after completion

2018-09-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611160#comment-16611160
 ] 

Jason Lowe commented on MAPREDUCE-7136:
---

I would recommend opening a single ticket per maven project or even per JIRA 
project if it's not too huge of a patch to address these.  I agree fixing these 
one at a time is going to be too much overhead.  Be sure to call out that this 
is a fix for JVM reuse tests (e.g.: run via IDE) and not a test failure seen by 
running surefire, or people may get confused since they cannot reproduce it.

> TestMRAppMetrics should shutdown DefaultMetricsSystem after completion
> --
>
> Key: MAPREDUCE-7136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7136.001.patch, 
> image-2018-09-10-14-39-57-992.png
>
>
> TestMRAppMetrics should invoke shutdown method in DefaultMetricsSystem after 
> completion. Since it can lead to failing other tests. For example, 
> TestRMContainerAllocator#testReportedAppProgress fails when run after 
> TestMRAppMetrics#testNames with the following error:
> {noformat}
> org.apache.hadoop.metrics2.MetricsException: Metrics source MRAppMetrics 
> already exists!
> {noformat}
> !image-2018-09-10-14-39-57-992.png!
> We do not catch this on the trunk since the test 
> TestRMContainerAllocator#testUnsupportedMapContainerRequirement run first and 
> "DefaultMetricsSystem.shutdown();" invokes after completion. But since JUnit 
> does not guarantee the order of the tests we should fix it. 
>  Also, this is affected by previous versions which run 
> testReportedAppProgress first (I faced it on 2.7.0 version). And reproduced 
> on the trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7136) TestMRAppMetrics should shutdown DefaultMetricsSystem after completion

2018-09-11 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7136:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk.

> TestMRAppMetrics should shutdown DefaultMetricsSystem after completion
> --
>
> Key: MAPREDUCE-7136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7136.001.patch, 
> image-2018-09-10-14-39-57-992.png
>
>
> TestMRAppMetrics should invoke shutdown method in DefaultMetricsSystem after 
> completion. Since it can lead to failing other tests. For example, 
> TestRMContainerAllocator#testReportedAppProgress fails when run after 
> TestMRAppMetrics#testNames with the following error:
> {noformat}
> org.apache.hadoop.metrics2.MetricsException: Metrics source MRAppMetrics 
> already exists!
> {noformat}
> !image-2018-09-10-14-39-57-992.png!
> We do not catch this on the trunk since the test 
> TestRMContainerAllocator#testUnsupportedMapContainerRequirement run first and 
> "DefaultMetricsSystem.shutdown();" invokes after completion. But since JUnit 
> does not guarantee the order of the tests we should fix it. 
>  Also, this is affected by previous versions which run 
> testReportedAppProgress first (I faced it on 2.7.0 version). And reproduced 
> on the trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7136) TestMRAppMetrics should shutdown DefaultMetricsSystem after completion

2018-09-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611032#comment-16611032
 ] 

Jason Lowe commented on MAPREDUCE-7136:
---

I'm fine with making a small improvement to the tests to make them easier to 
run in other modes like via the IDE. I was mostly curious to learn about the 
use-case since that's not how the tests will run during a normal build.  That 
also explains why precommit and nightly builds have never reported this. 
However it's important to note problems like this are likely to keep cropping 
up since JVM reuse for separate test classes is not regularly tested.

+1 for the patch, committing this.
{quote}Could you kindly review other tickets which I opened (especially which 
are not related to tests)?
{quote}
I'll try to take a look at some of those this week.

> TestMRAppMetrics should shutdown DefaultMetricsSystem after completion
> --
>
> Key: MAPREDUCE-7136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7136.001.patch, 
> image-2018-09-10-14-39-57-992.png
>
>
> TestMRAppMetrics should invoke shutdown method in DefaultMetricsSystem after 
> completion. Since it can lead to failing other tests. For example, 
> TestRMContainerAllocator#testReportedAppProgress fails when run after 
> TestMRAppMetrics#testNames with the following error:
> {noformat}
> org.apache.hadoop.metrics2.MetricsException: Metrics source MRAppMetrics 
> already exists!
> {noformat}
> !image-2018-09-10-14-39-57-992.png!
> We do not catch this on the trunk since the test 
> TestRMContainerAllocator#testUnsupportedMapContainerRequirement run first and 
> "DefaultMetricsSystem.shutdown();" invokes after completion. But since JUnit 
> does not guarantee the order of the tests we should fix it. 
>  Also, this is affected by previous versions which run 
> testReportedAppProgress first (I faced it on 2.7.0 version). And reproduced 
> on the trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7136) TestMRAppMetrics should shutdown DefaultMetricsSystem after completion

2018-09-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610894#comment-16610894
 ] 

Jason Lowe commented on MAPREDUCE-7136:
---

Thanks for the report and patch!

TestRMContainerAllocator and TestMRAppMetrics are separate test classes and 
therefore should run in separate VMs.  The poms specify reuseForks=false in the 
surefire setup to enforce this.  How are you getting the tests to reuse JVMs?  
That will inevitably lead to a lot of errors like this one since the project 
explicitly is built to not run tests that way.


> TestMRAppMetrics should shutdown DefaultMetricsSystem after completion
> --
>
> Key: MAPREDUCE-7136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7136.001.patch, 
> image-2018-09-10-14-39-57-992.png
>
>
> TestMRAppMetrics should invoke shutdown method in DefaultMetricsSystem after 
> completion. Since it can lead to failing other tests. For example, 
> TestRMContainerAllocator#testReportedAppProgress fails when run after 
> TestMRAppMetrics#testNames with the following error:
> {noformat}
> org.apache.hadoop.metrics2.MetricsException: Metrics source MRAppMetrics 
> already exists!
> {noformat}
> !image-2018-09-10-14-39-57-992.png!
> We do not catch this on the trunk since the test 
> TestRMContainerAllocator#testUnsupportedMapContainerRequirement run first and 
> "DefaultMetricsSystem.shutdown();" invokes after completion. But since JUnit 
> does not guarantee the order of the tests we should fix it. 
>  Also, this is affected by previous versions which run 
> testReportedAppProgress first (I faced it on 2.7.0 version). And reproduced 
> on the trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7135) TestTaskAttemptContainerRequest should reset UserGroupInformation

2018-09-11 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7135:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks, [~oshevchenko]!  I committed this to trunk.

> TestTaskAttemptContainerRequest should reset UserGroupInformation
> -
>
> Key: MAPREDUCE-7135
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7135
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7135.001.patch, MAPREDUCE-7135.002.patch, 
> image-2018-09-10-13-23-46-533.png
>
>
> TestTaskAttemptContainerRequest should reset UserGroupInformation after end 
> since this test cache UserGroupInformation and can be cause of other test 
> fail. For example, all test in TestRMContainerAllocator will be failed if we 
> run these tests after TestTaskAttemptContainerRequest (or we can change 
> "UserGroupInformation.setLoginUser(null)" with UserGroupInformation.reset() 
> in this class).
> Also, I think the WARNING message in TestTaskAttemptContainerRequest can be 
> removed since we can reset UserGroupInformation after each test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7135) TestTaskAttemptContainerRequest should reset UserGroupInformation

2018-09-11 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610866#comment-16610866
 ] 

Jason Lowe commented on MAPREDUCE-7135:
---

Thanks for the report and patches!  +1 for patch v2.  Committing this.

> TestTaskAttemptContainerRequest should reset UserGroupInformation
> -
>
> Key: MAPREDUCE-7135
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7135
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: MAPREDUCE-7135.001.patch, MAPREDUCE-7135.002.patch, 
> image-2018-09-10-13-23-46-533.png
>
>
> TestTaskAttemptContainerRequest should reset UserGroupInformation after end 
> since this test cache UserGroupInformation and can be cause of other test 
> fail. For example, all test in TestRMContainerAllocator will be failed if we 
> run these tests after TestTaskAttemptContainerRequest (or we can change 
> "UserGroupInformation.setLoginUser(null)" with UserGroupInformation.reset() 
> in this class).
> Also, I think the WARNING message in TestTaskAttemptContainerRequest can be 
> removed since we can reset UserGroupInformation after each test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-09-06 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7131:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.2
   2.9.2
   3.0.4
   2.7.8
   2.8.5
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks to [~erwaman] for the contribution and to [~pbacsko] for additional 
review!  I committed this to trunk, branch-3.1, branch-3.0, branch-2, 
branch-2.9, branch-2.8, and branch-2.7.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.8.5, 2.7.8, 3.0.4, 2.9.2, 3.1.2
>
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch, 
> MAPREDUCE-7131.3.patch, MAPREDUCE-7131.4.patch, MAPREDUCE-7131.5.patch, 
> MAPREDUCE-7131.6.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-09-06 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606285#comment-16606285
 ] 

Jason Lowe commented on MAPREDUCE-7131:
---

Thanks for updating the patch!  +1 lgtm.  Committing this.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch, 
> MAPREDUCE-7131.3.patch, MAPREDUCE-7131.4.patch, MAPREDUCE-7131.5.patch, 
> MAPREDUCE-7131.6.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-09-04 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603658#comment-16603658
 ] 

Jason Lowe commented on MAPREDUCE-7131:
---

Thanks for the report and patch!

Are the timezone changes in the code really necessary?  They only appear to be 
necessary because the unit test is hardcoding a UTC date in the paths rather 
than computing it based on the current timezone.  I think the unit test can 
call JobHistoryUtils.timestampDirectoryComponent to help compute the done 
directory path.  That would preclude the JobHistoryUtils changes and the need 
to override canonicalHistoryLogPath in the unit test.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
> Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch, 
> MAPREDUCE-7131.3.patch, MAPREDUCE-7131.4.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE) and added to the 
> *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7126) Error communicating with RM: Resource Manager doesn't recognize AttemptId: appattempt_idxx

2018-08-17 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584113#comment-16584113
 ] 

Jason Lowe commented on MAPREDUCE-7126:
---

There needs to be more information on this report to diagnose if there's a real 
issue here.  You will need to check the RM logs to see why it decided to 
respond to the AM as being unrecognized.  One theory is the AM could have been 
running on a node where the nodemanager crashed.  If that occurs then the RM 
will eventually expire the nodemanager due to lack of heartbeats and consider 
all of the containers on that node lost.  When the AM proceeds to heartbeat to 
the RM I would expect the RM to reply that the AM is no longer recognized since 
the RM considers that app attempt dead (being on a lost node).  Having the AM 
shut down (without unregistering!) is appropriate in that case.



> Error communicating with RM: Resource Manager doesn't recognize AttemptId: 
> appattempt_idxx
> --
>
> Key: MAPREDUCE-7126
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7126
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.6.0
>Reporter: Avdhesh kumar
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2018-07-17 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved MAPREDUCE-6948.
---
Resolution: Cannot Reproduce

I agree as well.  I have not seen any recent precommit failures on 3.x releases 
for this unit test.

> TestJobImpl.testUnusableNodeTransition failed
> -
>
> Key: MAPREDUCE-6948
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Jim Brennan
>Priority: Major
>  Labels: unit-test
>
> *Error Message*
> expected: but was:
> *Stacktrace*
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)
> *Standard out*
> {code}
> 2017-08-30 10:12:21,928 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2017-08-30 10:12:21,939 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,941 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to 
> jobTokenSecretManager
> 2017-08-30 10:12:21,941 WARN  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. 
> Using job token secret as shuffle secret.
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 
> because: not enabled;
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createMapTasks(1562)) - Input size for job 
> job_123456789_0001 = 0. Number of splits = 2
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job 
> job_123456789_0001 = 1
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW 
> to INITED
> 2017-08-30 10:12:21,946 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> INITED to SETUP
> 2017-08-30 10:12:21,954 INFO  [CommitterEvent Processor #0] 
> commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - 
> Processing the event EventType: JOB_SETUP
> 2017-08-30 10:12:21,978 INFO  [AsyncDispatcher event handler] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> SETUP to RUNNING
> 2017-08-30 10:12:21,983 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5
> 2017-08-30 10:12:22,000 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 1
> 2017-08-30 10:12:22,029 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 2
> 2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on 
> unusable node Mock for NodeId, hashCode: 1280187896. 
> AttemptId:attempt_123456789_0001_m_00_0
> 2017-08-30

[jira] [Updated] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-03 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7118:
--
Status: Patch Available  (was: Open)

Attaching a patch that essentially ports the fix for MAPREDUCE-4549 to Hadoop 
3.x.


> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.1.0, 3.0.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-03 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7118:
--
Attachment: MAPREDUCE-7118.001.patch

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-03 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-7118:
-

 Summary: Distributed cache conflicts breaks backwards compatability
 Key: MAPREDUCE-7118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.1.0, 3.0.0, 3.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe


MAPREDUCE-4503 made distributed cache conflicts break job submission, but this 
was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately the 
latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When Oozie, 
Pig, and other downstream projects that can occasionally generate distributed 
cache conflicts move to Hadoop 3.x the workflows that used to work on 0.23 and 
2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7086) Add config to allow FileInputFormat to ignore directories when recursive=false

2018-05-03 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7086:
--
Fix Version/s: 3.1.1

I committed this to branch-3.1 as well.

> Add config to allow FileInputFormat to ignore directories when recursive=false
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7086) Add config to allow FileInputFormat to ignore directories when recursive=false

2018-05-01 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7086:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks to [~sershe] for the contribution at to [~ste...@apache.org] for 
additional review!  I committed this to trunk.

> Add config to allow FileInputFormat to ignore directories when recursive=false
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7086) Add config to allow FileInputFormat to ignore directories when recursive=false

2018-05-01 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7086:
--
   Summary: Add config to allow FileInputFormat to ignore directories when 
recursive=false  (was: FileInputFormat recursive=false fails instead of 
ignoring the directories.)
Issue Type: Improvement  (was: Bug)

> Add config to allow FileInputFormat to ignore directories when recursive=false
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-05-01 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460170#comment-16460170
 ] 

Jason Lowe commented on MAPREDUCE-7086:
---

My apologies for the delay.  +1 lgtm, committing this.


> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.01.patch, 
> MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-24 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450120#comment-16450120
 ] 

Jason Lowe commented on MAPREDUCE-7086:
---

Thanks for updating the patch!

Would it be clearer if "non-recursive" or something similar was in the property 
name?  Otherwise it gets confusing when recursive=true and ignore.subdirs=true 
as well.

Does mapreduce.lib.input.FileInputFormat need to be updated as well?  It looks 
like that take a slightly different tactic of allowing directories in 
non-recursive mode but generating degenerate splits for them.  ignore.subdirs 
arguably should make it not generate any splits for directory entries, even 
degenerate ones.

I agree with Steve that it would be good to add a unit test to verify the 
behavior of the new property.  It would also be nice to cleanup the checkstyle 
warnings which are akin to the whitespace warnings.


> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch, MAPREDUCE-7086.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450052#comment-16450052
 ] 

Jason Lowe commented on MAPREDUCE-7026:
---

Thanks for the report and the patch!

The error posted from the Fetcher in the description is not what this is 
targeting -- that's simply the error message that is printed when the Fetcher 
meets the maximum failure threshold and decides to kill the reducer due to lack 
of sufficient shuffle progress.  This patch will not affect that error message.

Do you have some sample log output after this patch has been applied?  The 
IllegalArgumentException is because the Fetcher is trying to consume a 
ShuffleHeader when the NM has instead decided to return error text.  One 
potential problem with the approach is that the process of trying to parse a 
shuffle header could have consumed some of the error message before the error 
was thrown (i.e.: whatever bytes the readFields call consumed), and the 
approach in the patch will lose those bytes.  It'd be good to see some sample 
output to see how it's working in practice.

There's no limit to how much data this will try to buffer.  If there's a 
corrupted bit in the shuffle header and the NM is _not_ emitting an error 
message then this code could attempt to buffer many megabytes (or gigabytes!) 
of shuffle data after the shuffle header.  That could blow the heap and cause a 
teardown of the task on an error that should be retriable.  There should be a 
reasonable limit on the amount of data this will try to consume before an EOF 
is reached.  Also attempting to read compressed data looking for a line 
terminator could cause us to consume a lot of data before we find one.

Why is StringBuffer being used instead of StringBuilder?  Synchronization is 
not needed here.  Also why is an empty string being passed to the constructor 
rather than calling the default constructor?


> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
>

[jira] [Assigned] (MAPREDUCE-7086) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-7086:
-

Assignee: Sergey Shelukhin

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: MAPREDUCE-7086
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7086
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7084) MRAppmaster still running after hadoop job -kill

2018-04-19 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444347#comment-16444347
 ] 

Jason Lowe commented on MAPREDUCE-7084:
---

Looks like the AM was not able to successfully unregister with the RM, but 
there's no indication as to why it would be having issues (no exception 
logged).  There's also no indication in the RM log as to why it was not able to 
process the unregistration.  A few followup questions:

Is this reproducible?

Is the RM log filtered for just this application or the entire log?  If it's 
filtered, I'm curious if there were any exceptions logged that did not contain 
the app ID during this time period.

Did the application transition out of the RUNNING state after the 10 minute 
expiration?  I see it lingered for a little over 10 minutes in the FINISHING 
state according to the RM log, which correlates with the AM's log indicating it 
is having difficulty unregistering.
{noformat}
018-04-18 10:10:39,350 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: 
application_1521543893962_10055214 State change from RUNNING to KILLING
2018-04-18 10:10:39,389 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Updating application attempt appattempt_1521543893962_10055214_01 with 
final state: FINISHING
2018-04-18 10:10:39,391 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_01 State change from RUNNING to 
FINAL_SAVING
2018-04-18 10:10:39,471 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_01 State change from FINAL_SAVING to 
FINISHING
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:appattempt_1521543893962_10055214_01 Timed out after 600 secs
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
Unregistering app attempt : appattempt_1521543893962_10055214_01
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_1521543893962_10055214_01 State change from FINISHING to FINISHED
2018-04-18 10:21:55,073 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning 
master appattempt_1521543893962_10055214_01
2018-04-18 10:21:56,095 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application appattempt_1521543893962_10055214_01 is done. 
finalState=FINISHED
2018-04-18 10:21:56,095 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1521543893962_10055214_01_04 Container Transitioned from ACQUIRED 
to KILLED
{noformat}

Is this really against Apache Hadoop 2.4.0?  If so that is a very old release, 
and I would highly recommend upgrading to at least 2.7 to see if it occurs 
there.


> MRAppmaster still running after hadoop job -kill 
> -
>
> Key: MAPREDUCE-7084
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7084
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.4.0
>Reporter: stefanlee
>Priority: Major
> Attachments: RM.log
>
>
> My scenario as follows:
>  1. I kill a application by *hadoop job -kill*.
>  2. the *FinalStatus* is *KILLED*, but its *State* is *RUNNING*
>  3. the MRAppmaster process has quit in NodeManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7081) Default speculator won't speculate the last several submitted reduced task if the total task num is large

2018-04-17 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7081:
--
Summary: Default speculator won't speculate the last several submitted 
reduced task if the total task num is large  (was: Default speculator won't 
sepculate the last several submitted reduced task if the total task num is 
large)

> Default speculator won't speculate the last several submitted reduced task if 
> the total task num is large
> -
>
> Key: MAPREDUCE-7081
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7081
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.9.0, 2.7.5
>Reporter: Zhizhen Hou
>Priority: Major
>
> DefaultSpeculator speculates a task one time.  By default, the number of 
> speculators is max(max(10, 0.01 * tasks.size), 0.1 * running tasks).
> I  set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after 
> all the map tasks are finished. The cluster has 1000 vcores, and the Job has 
> 5000 reduce jobs. At first, 1000 reduces tasks can run simultaneously, number 
> of speculators can speculator at most is 0.1 * 1000 = 100 tasks. Reduce tasks 
> with less data can over shortly, and speculator will speculator a task per 
> second by default. The task be speculated execution may be because the more 
> data to be processed. It will speculator  100 tasks within 100 seconds. When 
> 4900 reduces is over, If a reduce is executed with a lot of  data be 
> processed and is put on a slow machine. The speculate opportunity is running 
> out, it will not be speculated. It can increase the execution time of job 
> significantly.
> In short, it may waste the speculate opportunity at first only because the 
> execution time of  reduce with less data to be processed as average time. At  
> end of job, there is no speculate opportunity available, especially last 
> several running tasks, judged the number of the running tasks .  
> In my opinion, the number of running tasks should not determine the number of 
> speculate opportunity .The number of tasks be speculated can be judged by 
> square of finished task percent. Take an example, if ninety percent of  the 
> task is finished, only 0.9*0.9 = 0.81 speculate opportunity can be used. It 
> will leave enough opportunity for latter tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7080) Default speculator won't sepculate the last several submitted reduced task if the total task num is large

2018-04-17 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved MAPREDUCE-7080.
---
Resolution: Duplicate

Closing as a duplicate of MAPREDUCE-7081.

> Default speculator won't sepculate the last several submitted reduced task if 
> the total task num is large
> -
>
> Key: MAPREDUCE-7080
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7080
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.7.5
>Reporter: Zhizhen Hou
>Priority: Major
>
> DefaultSpeculator speculates a task one time. 
> By default, the number of speculators is max(max(10, 0.01 * tasks.size), 0.1 
> * running tasks)
> I  set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after 
> all the map tasks are finished.
> The cluster has 1000 vcores, and the Job has 5000 reduce jobs.
> At first, 1000 reduces tasks can run simultaneously, number of speculators 
> can speculator at most is 0.1 * 1000 = 100 tasks. Reduce tasks with less data 
> can over shortly, and speculator will speculator a task per second by 
> default. The task be speculated execution may be because the more data to be 
> processed. It will speculator  100 tasks within 100 seconds.
> When 4900 reduces is over, If a reduce is executed with a lot of  data be 
> processed and is put on a slow machine. The speculate opportunity is running 
> out, it will not be speculated. It can increase the execution time of job 
> significantly.
> In short, it may waste the speculate opportunity at first only because the 
> execution time of  reduce with less data to be processed as average time. At  
> end of job, there is no speculate opportunity available, especially last 
> several running tasks, judged the number of the running tasks .
>  
> In my opinion, the number of tasks be speculated can be judged by square of 
> finished task percent. Take an example, if ninety percent of  the task is 
> finished, only 0.9*0.9 = 0.81 speculate opportunity can be used. It will 
> leave enough opportunity for latter tasks.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-12 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7069:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
 Release Note: 
Environment variables for MapReduce tasks can now be specified as separate 
properties, e.g.:
mapreduce.map.env.VARNAME=value
mapreduce.reduce.env.VARNAME=value
yarn.app.mapreduce.am.env.VARNAME=value
yarn.app.mapreduce.am.admin.user.env.VARNAME=value
This form of specifying environment variables is useful when the value of an 
environement variable contains commas.
   Status: Resolved  (was: Patch Available)

Thanks to [~Jim_Brennan] for the contribution and to [~shaneku...@gmail.com] 
for additional review!  I committed this to trunk.

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-12 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7069:
--
Release Note: 
Environment variables for MapReduce tasks can now be specified as separate 
properties, e.g.:
mapreduce.map.env.VARNAME=value
mapreduce.reduce.env.VARNAME=value
yarn.app.mapreduce.am.env.VARNAME=value
yarn.app.mapreduce.am.admin.user.env.VARNAME=value
This form of specifying environment variables is useful when the value of an 
environment variable contains commas.

  was:
Environment variables for MapReduce tasks can now be specified as separate 
properties, e.g.:
mapreduce.map.env.VARNAME=value
mapreduce.reduce.env.VARNAME=value
yarn.app.mapreduce.am.env.VARNAME=value
yarn.app.mapreduce.am.admin.user.env.VARNAME=value
This form of specifying environment variables is useful when the value of an 
environement variable contains commas.


> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-12 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435838#comment-16435838
 ] 

Jason Lowe commented on MAPREDUCE-7069:
---

+1 lgtm as well.  The two unit tests that are failing are also failing in other 
precommit builds.  Filed MAPREDUCE-7078 and MAPREDUCE-7079 to track those two 
unit test failures.

Committing this.

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2018-04-12 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-7079:
-

 Summary: TestMRIntermediateDataEncryption is failing in precommit 
builds
 Key: MAPREDUCE-7079
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jason Lowe


TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
which causes the unit tests in jobclient to not pass cleanly during precommit 
builds. From sample precommit console output, note the lack of a test results 
line when the test is run:
{noformat}
[INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 s 
- in org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
[INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 s 
- in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[...]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:14 h
[INFO] Finished at: 2018-04-12T04:27:06+00:00
[INFO] Final Memory: 24M/594M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7078) TestPipeApplication is failing in precommit builds

2018-04-12 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-7078:
-

 Summary: TestPipeApplication is failing in precommit builds
 Key: MAPREDUCE-7078
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7078
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jason Lowe


TestPipeApplication is either timing out or tearing down the JVM which causes 
the unit tests in jobclient to not pass cleanly during precommit builds.  From 
sample precommit console output, note the lack of a test results line when the 
test is run:
{noformat}
[INFO] Running org.apache.hadoop.mapred.TestIFile
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1 s - in 
org.apache.hadoop.mapred.TestIFile
[INFO] Running org.apache.hadoop.mapred.pipes.TestPipeApplication
[INFO] Running org.apache.hadoop.mapred.pipes.TestPipesNonJavaInputFormat
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.02 s - 
in org.apache.hadoop.mapred.pipes.TestPipesNonJavaInputFormat
[...]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:14 h
[INFO] Finished at: 2018-04-12T04:27:06+00:00
[INFO] Final Memory: 24M/594M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-11 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434021#comment-16434021
 ] 

Jason Lowe commented on MAPREDUCE-7069:
---

Given this is XML and not HTML I think we should avoid the use of HTML tags for 
now.  Those tags will make it harder to read if the consumer is not viewing it 
with an HTML viewer.  For the mapred-default.xml descriptions I think we just 
need to mention that variables can also be specified as separate properties and 
give a simple example demonstrating the syntax.

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-10 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433021#comment-16433021
 ] 

Jason Lowe commented on MAPREDUCE-7069:
---

Thanks for updating the patch!  We're getting close.

The example command-line was updated to add the -D flags but they are still 
missing in the paragraph preceding it in MapReduceTutorial.md.

Looking at the docs generated at 
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
 I'm wondering if it would be more useful to add the description for the .[var] 
form of the properties to the base property instead of having separate, 
commented-out properties.  Users probably won't ever notice the commented-out 
text in that file, especially since it's buried in a .jar when deployed, but 
many would notice the docs generated for properties that aren't commented out.  
Thoughts?



> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, 
> MAPREDUCE-7069.006.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431378#comment-16431378
 ] 

Jason Lowe commented on MAPREDUCE-7069:
---

Thanks for updating the patch!  Looks like TestPipeApplication and 
TestMRIntermediateDataEncryption both exited abnormally.  I was not able to 
reproduce either failure locally with the patch applied.

In mapred-default.xml "UEnvironment" s/b "Environment".

The example added to the tutorial is missing the "-D" flag required to be in 
front of all of the example options.  These properties are not valid 
command-line options directly.

Nit: "alternateForm" confusd me as a parameter name to testAMStandardEnv at 
first, and the comment in the method body explaining what it did was key to 
understanding it.  Maybe "useSeparateEnvProps" or something similar would be a 
better parameter name?

Nit: It would be nice to have some whitespace between the unit test methods for 
readability.

I'm not sure the DockerContainers.md change is necessary.  We still support the 
old, single-property way to set a list of environment variables that doesn't 
contain commas in the values.  I'm not sure we need to make the example more 
complicated given the variable settings don't have commas and therefore would 
require the separate property form.

> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, 
> MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually

2018-04-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425770#comment-16425770
 ] 

Jason Lowe commented on MAPREDUCE-7069:
---

Thanks for the patch!

It's a bit odd and inefficient that setVMEnv calls 
MRApps.setEnvFromInputProperty twice.  I think it would be clearer and more 
efficient to call it once, place the results in a temporary map (like it 
already does in the second call), then only set HADOOP_ROOT_LOGGER and 
HADOOP_CLIENT_OPTS in the environment if they are not set in the temporary map. 
 Then at the end we can simply call addAll to dump the contents of the 
temporary map into the environment map.

The example documentation in JobConf is confusing.  It uses 
"MAPRED_MAP_TASK_ENV" and "MAPRED_REDUCE_TASK_ENV" but those literal strings 
should not be used in the property name.  It would be clearer if this used 
"mapreduce.map.env" and "mapreduce.reduce.env" in the examples.  Either that or 
give the example in the Java realm with something like set(MAPRED_MAP_TASK_ENV 
+ ".varName", varValue) so it's clearly not a literal string in the property 
name.  My pereference is the former.

The relevant property descriptions in mapred-default.xml should be updated to 
reflect the new functionality.

It would be good to update MapReduceTutorial.md to document the options for 
passing environment variables to tasks.

There are a number of comments in setEnvFromString that should be fixed up.  I 
realize this is mostly cut-n-paste from the old setEnvFromInputString, but 
since we're refactoring it would be nice to clean it up a bit in the process.  
There's not such thing as a tt (tasktracker) in YARN, and the comments imply 
this is only called to setup the env by a nodemanager for a child process.  
That's not always the case.  "note" s/b "not", etc.

For javadoc comments it's not necessary to state the type of the variable after 
the variable name.  Javadoc can automatically extract this from the method 
signature.

Nit: setEnvFromInputStringMap does not need to be public.

Would it be easier to call tmpEnv.addAll(inputMap) and pass tmpEnv instead of 
inputMap?  Then we don't need to explicitly iterate the map.

The unit test should add a new properies with commas and or equal signs in the 
value and verify the values come through in the environment map.

Does it make sense to split some of the unit test up into separate tests?  For 
example the null input test can easily stand by itself.  Separate tests make it 
easier to identify what's working and what's broken rather than a stacktrace 
with a line number in the middle of a large unit test that is testing many 
different aspects.


> Add ability to specify user environment variables individually
> --
>
> Key: MAPREDUCE-7069
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch
>
>
> As reported in YARN-6830, it is currently not possible to specify an 
> environment variable that contains commas via {{mapreduce.map.env}}, 
> mapreduce.reduce.env, or {{mapreduce.admin.user.env}}.
> To address this, [~aw] proposed in [YARN-6830] that we add the ability to 
> specify environment variables individually:
> {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7068) Fix Reduce Exception was overwrited by ReduceTask

2018-03-22 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409763#comment-16409763
 ] 

Jason Lowe commented on MAPREDUCE-7068:
---

Is there a real problem here or a theoretical one?  The JIRA started out 
stating cleanupWithLogger is smashing exceptions which isn't the case, and the 
YarnUncaughtExceptionHandler will log any catastrophic errors that would 
prevent the task from running successfully.  I'm not seeing an issue here.

> Fix Reduce Exception was overwrited by ReduceTask
> -
>
> Key: MAPREDUCE-7068
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7068
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 2.7.1
> Environment: CentOS 7
> Hadoop-2.7.1
> Hive-1.2.1
>Reporter: tartarus
>Priority: Major
> Attachments: MAPREDUCE_7068.patch
>
>
>  
> {code:java}
> try {
>   //increment processed counter only if skipping feature is enabled
>   boolean incrProcCount = SkipBadRecords.getReducerMaxSkipGroups(job)>0 &&
> SkipBadRecords.getAutoIncrReducerProcCount(job);
>   
>   ReduceValuesIterator values = isSkipping() ? 
>   new SkippingReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass, 
>   job, reporter, umbilical) :
>   new ReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass,
>   job, reporter);
>   values.informReduceProgress();
>   while (values.more()) {
> reduceInputKeyCounter.increment(1);
> reducer.reduce(values.getKey(), values, collector, reporter);
> if(incrProcCount) {
>   reporter.incrCounter(SkipBadRecords.COUNTER_GROUP, 
>   SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS, 1);
> }
> values.nextKey();
> values.informReduceProgress();
>   }
>   reducer.close();
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanupWithLogger(LOG, reducer);
>   closeQuietly(out, reporter);
> }
>   }
> {code}
> if  {color:#d04437}reducer.close();{color} throw Exception , 
> {color:#d04437}reducer = null;{color} will not run, then 
> {color:#d04437}IOUtils.cleanupWithLogger(LOG, reducer); {color}
>  
> will throw Exception and overwrite the Exception of reducer.close();
> so we should catch it and print log to help targeting issues
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7068) Fix Reduce Exception was overwrited by ReduceTask

2018-03-22 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409685#comment-16409685
 ] 

Jason Lowe commented on MAPREDUCE-7068:
---

Errors are much more severe than Exceptions.  If we're getting something like 
OOMError then that's arguably more important than whatever exception occurred 
in the try block.  If we're worred about lack of logging of the error, in 
practice the YarnUncaughtExceptionHandler is going to catch this anyway 
(installed by YarnChild when the task runs).

> Fix Reduce Exception was overwrited by ReduceTask
> -
>
> Key: MAPREDUCE-7068
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7068
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 2.7.1
> Environment: CentOS 7
> Hadoop-2.7.1
> Hive-1.2.1
>Reporter: tartarus
>Priority: Major
> Attachments: MAPREDUCE_7068.patch
>
>
>  
> {code:java}
> try {
>   //increment processed counter only if skipping feature is enabled
>   boolean incrProcCount = SkipBadRecords.getReducerMaxSkipGroups(job)>0 &&
> SkipBadRecords.getAutoIncrReducerProcCount(job);
>   
>   ReduceValuesIterator values = isSkipping() ? 
>   new SkippingReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass, 
>   job, reporter, umbilical) :
>   new ReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass,
>   job, reporter);
>   values.informReduceProgress();
>   while (values.more()) {
> reduceInputKeyCounter.increment(1);
> reducer.reduce(values.getKey(), values, collector, reporter);
> if(incrProcCount) {
>   reporter.incrCounter(SkipBadRecords.COUNTER_GROUP, 
>   SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS, 1);
> }
> values.nextKey();
> values.informReduceProgress();
>   }
>   reducer.close();
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanupWithLogger(LOG, reducer);
>   closeQuietly(out, reporter);
> }
>   }
> {code}
> if  {color:#d04437}reducer.close();{color} throw Exception , 
> {color:#d04437}reducer = null;{color} will not run, then 
> {color:#d04437}IOUtils.cleanupWithLogger(LOG, reducer); {color}
>  
> will throw Exception and overwrite the Exception of reducer.close();
> so we should catch it and print log to help targeting issues
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7068) Fix Reduce Exception was overwrited by ReduceTask

2018-03-22 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409561#comment-16409561
 ] 

Jason Lowe commented on MAPREDUCE-7068:
---

Is this reported against 2.7 or something else?  IOUtils.cleanupWithLogger 
doesn't exist in 2.7.

I'm also not seeing how the finally block is smashing an exception with its 
own, since cleanupWithLogger looks like this:
{code}
  public static void cleanupWithLogger(Logger logger,
  java.io.Closeable... closeables) {
for (java.io.Closeable c : closeables) {
  if (c != null) {
try {
  c.close();
} catch (Throwable e) {
  if (logger != null) {
logger.debug("Exception in closing {}", c, e);
  }
}
  }
}
  }
{code}
and closeQuietly similarly suppresses all Exceptions.


> Fix Reduce Exception was overwrited by ReduceTask
> -
>
> Key: MAPREDUCE-7068
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7068
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 2.7.1
> Environment: CentOS 7
> Hadoop-2.7.1
> Hive-1.2.1
>Reporter: tartarus
>Priority: Major
> Attachments: MAPREDUCE_7068.patch
>
>
>  
> {code:java}
> try {
>   //increment processed counter only if skipping feature is enabled
>   boolean incrProcCount = SkipBadRecords.getReducerMaxSkipGroups(job)>0 &&
> SkipBadRecords.getAutoIncrReducerProcCount(job);
>   
>   ReduceValuesIterator values = isSkipping() ? 
>   new SkippingReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass, 
>   job, reporter, umbilical) :
>   new ReduceValuesIterator(rIter, 
>   comparator, keyClass, valueClass,
>   job, reporter);
>   values.informReduceProgress();
>   while (values.more()) {
> reduceInputKeyCounter.increment(1);
> reducer.reduce(values.getKey(), values, collector, reporter);
> if(incrProcCount) {
>   reporter.incrCounter(SkipBadRecords.COUNTER_GROUP, 
>   SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS, 1);
> }
> values.nextKey();
> values.informReduceProgress();
>   }
>   reducer.close();
>   reducer = null;
>   
>   out.close(reporter);
>   out = null;
> } finally {
>   IOUtils.cleanupWithLogger(LOG, reducer);
>   closeQuietly(out, reporter);
> }
>   }
> {code}
> if  {color:#d04437}reducer.close();{color} throw Exception , 
> {color:#d04437}reducer = null;{color} will not run, then 
> {color:#d04437}IOUtils.cleanupWithLogger(LOG, reducer); {color}
>  
> will throw Exception and overwrite the Exception of reducer.close();
> so we should catch it and print log to help targeting issues
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2018-03-21 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408655#comment-16408655
 ] 

Jason Lowe commented on MAPREDUCE-6948:
---

It looks like this was reported against 3.0.0-alpha4.  All of the proposed 
fixes for this would also be in 3.0.0-alpha4, so if Haibo can confirm this 
occurred on alpha4 then it could still be a valid issue.

> TestJobImpl.testUnusableNodeTransition failed
> -
>
> Key: MAPREDUCE-6948
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Jim Brennan
>Priority: Major
>  Labels: unit-test
>
> *Error Message*
> expected: but was:
> *Stacktrace*
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)
> *Standard out*
> {code}
> 2017-08-30 10:12:21,928 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2017-08-30 10:12:21,939 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,941 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to 
> jobTokenSecretManager
> 2017-08-30 10:12:21,941 WARN  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. 
> Using job token secret as shuffle secret.
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 
> because: not enabled;
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createMapTasks(1562)) - Input size for job 
> job_123456789_0001 = 0. Number of splits = 2
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job 
> job_123456789_0001 = 1
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW 
> to INITED
> 2017-08-30 10:12:21,946 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> INITED to SETUP
> 2017-08-30 10:12:21,954 INFO  [CommitterEvent Processor #0] 
> commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - 
> Processing the event EventType: JOB_SETUP
> 2017-08-30 10:12:21,978 INFO  [AsyncDispatcher event handler] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> SETUP to RUNNING
> 2017-08-30 10:12:21,983 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5
> 2017-08-30 10:12:22,000 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 1
> 2017-08-30 10:12:22,029 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 2
> 2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on 
> unusable node

[jira] [Updated] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-03-14 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7064:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.2
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks, [~pbacsko]!  I committed this to trunk, branch-3.1, and branch-3.0.

> Flaky test TestTaskAttempt#testReducerCustomResourceTypes
> -
>
> Key: MAPREDUCE-7064
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
> Attachments: MAPREDUCE-7064-001.patch, MAPREDUCE-7064-002.patch, 
> MAPREDUCE-7064-003.patch
>
>
> The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
> fail with the following error:
> {noformat}
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
> 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
> COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
> 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, 
> minimum allocation: 0, maximum allocation: 9223372036854775807]
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
> {noformat}
> The root cause seems to be an interference from previous tests that start 
> instance(s) of {{FailingAttemptsMRApp}} or 
> {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
> {{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-03-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399209#comment-16399209
 ] 

Jason Lowe commented on MAPREDUCE-7064:
---

+1 lgtm.  I'll clean up the unused import on the commit.


> Flaky test TestTaskAttempt#testReducerCustomResourceTypes
> -
>
> Key: MAPREDUCE-7064
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7064-001.patch, MAPREDUCE-7064-002.patch, 
> MAPREDUCE-7064-003.patch
>
>
> The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
> fail with the following error:
> {noformat}
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
> 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
> COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
> 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, 
> minimum allocation: 0, maximum allocation: 9223372036854775807]
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
> {noformat}
> The root cause seems to be an interference from previous tests that start 
> instance(s) of {{FailingAttemptsMRApp}} or 
> {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
> {{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-03-14 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398694#comment-16398694
 ] 

Jason Lowe commented on MAPREDUCE-7064:
---

Thanks for updating the patch!  Just noticed one more thing I missed in the 
first review.  There's one test block in testMRAppHistoryForTAFailedInAssigned 
that didn't close the app like all the other ones did.  Intentional?
{code}
  // test TA_FAILMSG for reduce
  app =
  new FailingAttemptsDuringAssignedMRApp(0, 1,
  TaskAttemptEventType.TA_FAILMSG);
  testTaskAttemptAssignedFailHistory(app);

  // test TA_FAILMSG_BY_CLIENT for map
  app =
  new FailingAttemptsDuringAssignedMRApp(1, 0,
  TaskAttemptEventType.TA_FAILMSG_BY_CLIENT);
{code}


> Flaky test TestTaskAttempt#testReducerCustomResourceTypes
> -
>
> Key: MAPREDUCE-7064
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7064-001.patch, MAPREDUCE-7064-002.patch
>
>
> The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
> fail with the following error:
> {noformat}
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
> 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
> COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
> 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, 
> minimum allocation: 0, maximum allocation: 9223372036854775807]
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
> {noformat}
> The root cause seems to be an interference from previous tests that start 
> instance(s) of {{FailingAttemptsMRApp}} or 
> {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
> {{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes

2018-03-13 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397175#comment-16397175
 ] 

Jason Lowe commented on MAPREDUCE-7064:
---

Thanks for the patch!  Looks like a reasonable approach to me.  Curious, why 
the isInState(STATE.STOPPED) check before calling close but only in one 
instance?  Looks like the close method already does nothing if it is in the 
stopped state, so this looks like extraneous code.


> Flaky test TestTaskAttempt#testReducerCustomResourceTypes
> -
>
> Key: MAPREDUCE-7064
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7064-001.patch
>
>
> The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally 
> fail with the following error:
> {noformat}
> org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 
> 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: 
> COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 
> 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, 
> minimum allocation: 0, maximum allocation: 9223372036854775807]
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535)
> {noformat}
> The root cause seems to be an interference from previous tests that start 
> instance(s) of {{FailingAttemptsMRApp}} or 
> {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, 
> {{testReducerCustomResourceTypes}} always passed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6930) mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present twice in mapred-default.xml

2018-03-09 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-6930:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.2
   2.8.4
   2.9.1
   2.10.0
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks, [~Sen Zhao]!  I committed this to trunk, branch-3.1, branch-3.0, 
branch-2, branch-2.9, and branch-2.8.

> mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present 
> twice in mapred-default.xml
> -
>
> Key: MAPREDUCE-6930
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6930
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.4, 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Sen Zhao
>Priority: Major
>  Labels: newbie
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: MAPREDUCE-6930.001.patch
>
>
> The second set should be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6930) mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present twice in mapred-default.xml

2018-03-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393129#comment-16393129
 ] 

Jason Lowe commented on MAPREDUCE-6930:
---

Thanks for the patch!

+1 lgtm.  Committing this.

> mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present 
> twice in mapred-default.xml
> -
>
> Key: MAPREDUCE-6930
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6930
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.4, 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Sen Zhao
>Priority: Major
>  Labels: newbie
> Attachments: MAPREDUCE-6930.001.patch
>
>
> The second set should be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Assigned] (MAPREDUCE-6930) mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present twice in mapred-default.xml

2018-03-07 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-6930:
-

Assignee: Sen Zhao

> mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores are both present 
> twice in mapred-default.xml
> -
>
> Key: MAPREDUCE-6930
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6930
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.7.4, 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Sen Zhao
>Priority: Major
>  Labels: newbie
>
> The second set should be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7059) Downward Compatibility issue: MR job fails because of unknown setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster

2018-03-01 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382079#comment-16382079
 ] 

Jason Lowe commented on MAPREDUCE-7059:
---

Should this go into branch-3.0 as well given MAPREDUCE-6954 is also there?

> Downward Compatibility issue: MR job fails because of unknown 
> setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster
> ---
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Fix For: 3.1.0, 3.2.0
>
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch, 
> MAPREDUCE-7059.006.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
>

[jira] [Updated] (MAPREDUCE-7060) Cherry Pick PathOutputCommitter class/factory to branch-3.0 & 2.10

2018-02-28 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7060:
--
Description: 
It's easier for downstream apps like Spark to pick up the new 
PathOutputCommitter superclass if it is there on 2.10+, even if the S3A 
committer isn't there. 

Adding the interface & binding stuff of MAPREDUCE-6956 allows for third party 
committers to be deployed. 

I'm not proposing a backport of the HADOOP-13786 committer: that's Java 8, 
S3Guard, etc. Too traumatic. All I want here is to allow downstream code to be 
able to pick up the new interface and so be able to support it and other store 
committers when available

  was:
It's easier for downstream apps like Spark to pick up the new 
PathOutputCommitter superclass if it is there on 2.10+, even if the S3A 
committer isn't there. 

Adding the interface & binding stuff of HADOOP-6956 allows for third party 
committers to be deployed. 

I'm not proposing a backport of the HADOOP-13786 committer: that's Java 8, 
S3Guard, etc. Too traumatic. All I want here is to allow downstream code to be 
able to pick up the new interface and so be able to support it and other store 
committers when available


+1 lgtm.

> Cherry Pick PathOutputCommitter class/factory to branch-3.0 & 2.10
> --
>
> Key: MAPREDUCE-7060
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7060
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.10.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: MAPREDUCE-7060-branch-3.0-001.patch, 
> MAPREDUCE-7060-branch-3.0-002.patch
>
>
> It's easier for downstream apps like Spark to pick up the new 
> PathOutputCommitter superclass if it is there on 2.10+, even if the S3A 
> committer isn't there. 
> Adding the interface & binding stuff of MAPREDUCE-6956 allows for third party 
> committers to be deployed. 
> I'm not proposing a backport of the HADOOP-13786 committer: that's Java 8, 
> S3Guard, etc. Too traumatic. All I want here is to allow downstream code to 
> be able to pick up the new interface and so be able to support it and other 
> store committers when available



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380330#comment-16380330
 ] 

Jason Lowe commented on MAPREDUCE-7059:
---

+1 lgtm.

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Critical
> Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, 
> MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch, 
> MAPREDUCE-7059.006.patch
>
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) 
> {
> throw e;
>   } else {
> LOG.warn(
> "hdfs server does not have method disableErasureCodingForPath," 
> + " and skip disableErasureCodingForPath", e);
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
>

[jira] [Commented] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-26 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376919#comment-16376919
 ] 

Jason Lowe commented on MAPREDUCE-7059:
---

Thanks for the report!  This was introduced by MAPREDUCE-6954.  I'm not sure 
there's a good way for an HDFS client to know which version the server has, so 
there may not be a cleaner solution than pulling apart the exception and 
checking for no such method.

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Minor
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!(e.getCause() instanceof RpcNoSuchMethodException)) {
> throw e;
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2662)
>   at 
>

[jira] [Moved] (MAPREDUCE-7059) Compatibility issue: throw RpcNoSuchMethodException when run mapreduce job

2018-02-26 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe moved YARN-7970 to MAPREDUCE-7059:
-

Affects Version/s: (was: 3.0.0)
   3.0.0
  Component/s: (was: yarn)
   job submission
  Key: MAPREDUCE-7059  (was: YARN-7970)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Compatibility issue: throw RpcNoSuchMethodException when run mapreduce job
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Minor
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!(e.getCause() instanceof RpcNoSuchMethodException)) {
> throw e;
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2662)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
>

[jira] [Updated] (MAPREDUCE-7059) Compatibility issue: job submission fails with RpcNoSuchMethodException when submitting to 2.x cluster

2018-02-26 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7059:
--
Summary: Compatibility issue: job submission fails with 
RpcNoSuchMethodException when submitting to 2.x cluster  (was: Compatibility 
issue: throw RpcNoSuchMethodException when run mapreduce job)

> Compatibility issue: job submission fails with RpcNoSuchMethodException when 
> submitting to 2.x cluster
> --
>
> Key: MAPREDUCE-7059
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Affects Versions: 3.0.0
>Reporter: Jiandan Yang 
>Priority: Minor
>
> Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8.
> {code:java}
> bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar  teragen  
> 10 /teragen
> {code}
> The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy.
> one  solution is parsing RemoteException in 
> JobResourceUploader#disableErasure like this:
> {code:java}
> private void disableErasureCodingForPath(FileSystem fs, Path path)
>   throws IOException {
> try {
>   if (jtFs instanceof DistributedFileSystem) {
> LOG.info("Disabling Erasure Coding for path: " + path);
> DistributedFileSystem dfs = (DistributedFileSystem) jtFs;
> dfs.setErasureCodingPolicy(path,
> SystemErasureCodingPolicies.getReplicationPolicy().getName());
>   }
> } catch (RemoteException e) {
>   if (!(e.getCause() instanceof RpcNoSuchMethodException)) {
> throw e;
>   }
> }
>   }
> {code}
> Does anyone have better solution?
> The detailed exception trace is:
> {code:java}
> 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging 
> area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method setErasureCodingPolicy called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol.
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.DFSClient.setErasureCodingPolicy(DFSClient.java:2678)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2665)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$63.doCall(DistributedFileSystem.java:2662)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
>

[jira] [Updated] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-02-15 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7053:
--
Affects Version/s: (was: 2.7.6)
 Target Version/s: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4  (was: 3.1.0, 3.0.1, 
2.10.0, 2.9.1, 2.8.4, 2.7.6)

Actually I don't think this is needed for branch-2.7.  MAPREDUCE-5044 which 
added the thread dump support on timeout didn't appear until 2.8.

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7052) TestFixedLengthInputFormat#testFormatCompressedIn is flaky

2018-02-15 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-7052:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.6
   2.8.4
   2.9.1
   2.10.0
   3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks, [~pbacsko]!  I committed this to trunk, branch-3.1, branch-3.0, 
branch-3.0.1, branch-2, branch-2.9, branch-2.8, and branch-2.7.

> TestFixedLengthInputFormat#testFormatCompressedIn is flaky
> --
>
> Key: MAPREDUCE-7052
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7052
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>
> Attachments: MAPREDUCE-7052-001.patch, MAPREDUCE-7052-002.patch
>
>
> Sometimes the test case TestFixedLengthInputFormat#testFormatCompressedIn can 
> fail with the following error:
> {noformat}
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>   at 
> org.apache.hadoop.mapred.TestFixedLengthInputFormat.runRandomTests(TestFixedLengthInputFormat.java:322)
>   at 
> org.apache.hadoop.mapred.TestFixedLengthInputFormat.testFormatCompressedIn(TestFixedLengthInputFormat.java:90)
> {noformat}
> *Root cause:* under special circumstances, the following line can return a 
> huge number:
> {noformat}
>   // Test a split size that is less than record len
>   numSplits = (int)(fileSize/Math.floor(recordLength/2));
> {noformat}
> For example, let {{seed}} be 2026428718. This causes {{recordLength}} to be 1 
> at iteration 19. {{Math.floor()}} returns negative Infinity, which becomes 
> positve infinity after the divison. Casting it to {{int}} yields 
> {{Integer.MAX_VALUE}}. Eventually we get an OOME because the test wants to 
> create a huge {{InputSplit}} array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7053) Timed out tasks can fail to produce thread dump

2018-02-15 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366266#comment-16366266
 ] 

Jason Lowe commented on MAPREDUCE-7053:
---

Thanks for the reviews! Here's the equivalent patch for branch-2.  There needs 
to be a separate one for branch-2.7 as well.

> Timed out tasks can fail to produce thread dump
> ---
>
> Key: MAPREDUCE-7053
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7053
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: MAPREDUCE-7053-branch-2.001.patch, 
> MAPREDUCE-7053.001.patch
>
>
> TestMRJobs#testThreadDumpOnTaskTimeout has been failing sporadically 
> recently.  When the AM times out a task it immediately removes it from the 
> list of known tasks and then connects to the NM to request a thread dump 
> followed by a kill.  If the task heartbeats in after the task has been 
> removed from the list of known tasks but before the thread dump signal 
> arrives then the task can exit with a "org.apache.hadoop.mapred.Task: Parent 
> died." message and no thread dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2002 matches

Mail list logo