[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-05-30 Thread Ahmed Hussein (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208.001.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-05-30 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852022#comment-16852022
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
--

 

[~jeagles], [~tgraves], [~vinodkv], [~nroberts]

I had some issues using {{ExponentiallySmoothedTaskRuntimeEstimator}}. I made 
some investigation and implemented a new estimator that addresses some issues 
with the existing smoothing factor estimator. Do you mind taking a look at the 
suggested fixes and implementations?

 

 *{{SimpleExponentialTaskRuntimeEstimator}} (new) Vs 
{{ExponentiallySmoothedTaskRuntimeEstimator}} (old)*
 # New estimator follows Basic Exponential Smooth.
 # New estimator does not return an estimate for the first few cycles. This 
increases the accuracy of estimation; especially for long running tasks
 # New Estimator detects tasks that are slowing down. Old estimator fails to 
detect such scenarios.
 # New Estimator detects stalled tasks. Old estimator will not launch any 
speculative attempts when an attempt has a sharp slow down.

*Is the default speculator affected?*
 * The speculator is still using the {{LegacyTaskRuntimeEstimator}} by default.
 * The existing implementation uses the statistics.mean to get an 
{{estimatedNewAttemptRuntime()}}. This causes frequent speculation as the 
smallest difference between the {{estimatedRuntime}} and the mean will create a 
new speculativeAttempt. I changed the implementation of 
{{estimatedNewAttemptRuntime()}} so that it uses (mean + a small delta)
 * I created a n JUnit {{TestSpeculativeExecOnCluster}} that verifies the 
speculator running on {{MiniMRYarnCluster}}. The test case can be used for the 
old estimators.

*Tuning parameters:*
 * {{job.task.estimator.simple.exponential.smooth.lambda-ms}}: The lambda value 
in the smoothing function of the task estimator
 * {{job.task.estimator.simple.exponential.smooth.stagnated-ms}}: The window 
length in the simple exponential smoothing that considers the task attempt is 
stagnated. This allows the speculator to detect stalled progress.
 * {{job.task.estimator.simple.exponential.smooth.skip-initials}}: The number 
of initial readings that the estimator ignores before giving a prediction. A 
simple smoothing needs several iterations before adjusting and returning good 
estimates.  The skip-initials parameter instructs the estimator to return 
"no-information" progress updates did not reach that value.

 

 

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-05-29 Thread Ahmed Hussein (JIRA)
Ahmed Hussein created MAPREDUCE-7208:


 Summary: Tuning TaskRuntimeEstimator 
 Key: MAPREDUCE-7208
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein
 Attachments: smoothing-exponential.md

By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
runtime.  The estimator does not adjust dynamically to the progress rate of the 
tasks. On the other hand, the existing alternative 
"ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.

 

There are several dimensions to improve the exponential implementation:
 # Exponential shooting needs a warmup period. Otherwise, the estimate will be 
affected by the initial values.
 # Using a single smoothing factor (Lambda) does not work well for all the 
tasks. To increase the level of smoothing across the majority of tasks, we need 
to give a range of flexibility to dynamically adjust the smoothing factor based 
on the history of the task progress.
 # Design wise, it is better to separate between the statistical model and the 
MR interface. We need to have a way to evaluate estimators statistically, 
without the need to run MR. For example, an estimator can be evaluated as a 
black box by using a stream of raw data as input and testing the accuracy of 
the generated stream of estimates.
 # The exponential estimator speculates frequently and fails to detect slowing 
tasks. It does not detect slowing tasks. As a result, a taskAttempt that does 
not do any progress won't trigger a new speculation.

 

The file [^smoothing-exponential.md] describes how Simple Exponential smoothing 
factor works.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-11-04 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966707#comment-16966707
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
--

The failed test case is not related to the patch.

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> MAPREDUCE-7208.003.patch, MAPREDUCE-7208.004.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-11-01 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965057#comment-16965057
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
--

{{TestJobSplitWriterWithEC}} seems not related to the patch. I will do further 
investigation before confirming that it is a flaky test.

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> MAPREDUCE-7208.003.patch, MAPREDUCE-7208.004.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-11-04 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208-branch-2.10.001.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208-branch-2.10.001.patch, 
> MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, MAPREDUCE-7208.003.patch, 
> MAPREDUCE-7208.004.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-11-05 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967913#comment-16967913
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
--

Thanks [~jeagles].

Reviewed 2.10 patch errors. They are unrelated time-out unit tests.

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0
>
> Attachments: MAPREDUCE-7208-branch-2.10.001.patch, 
> MAPREDUCE-7208-branch-2.10.002.patch, MAPREDUCE-7208.001.patch, 
> MAPREDUCE-7208.002.patch, MAPREDUCE-7208.003.patch, MAPREDUCE-7208.004.patch, 
> smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-11-05 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208-branch-2.10.002.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208-branch-2.10.001.patch, 
> MAPREDUCE-7208-branch-2.10.002.patch, MAPREDUCE-7208.001.patch, 
> MAPREDUCE-7208.002.patch, MAPREDUCE-7208.003.patch, MAPREDUCE-7208.004.patch, 
> smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-06 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968390#comment-16968390
 ] 

Ahmed Hussein commented on MAPREDUCE-7169:
--

[~BilwaST], what is the current state of the patch? Are you still working on 
this issue?

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-10-30 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208.003.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> MAPREDUCE-7208.003.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-10-30 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208.004.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> MAPREDUCE-7208.003.patch, MAPREDUCE-7208.004.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-10-29 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7208:
-
Attachment: MAPREDUCE-7208.002.patch

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

2019-10-29 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962437#comment-16962437
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
--

Thanks [~jeagles]. I looked at the test cases:
* {{hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp}} is a related test 
case and It was failing because I changed the threshold of the estimate that 
triggers a new speculative task. I fixed that default behavior in the new patch.
* {{hadoop.mapred.TestLocalMRNotification}} and 
{{hadoop.mapreduce.v2.TestMROldApiJobs}} seem to be a random failure. They pass 
successfully on local machine.

> Tuning TaskRuntimeEstimator 
> 
>
> Key: MAPREDUCE-7208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, 
> smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-19 Thread Ahmed Hussein (Jira)
Ahmed Hussein created MAPREDUCE-7252:


 Summary: Handling 0 progress in SimpleExponential task runtime 
estimator
 Key: MAPREDUCE-7252
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not to 
handle the corner cases where the delta progress is 0. As a result, the 
forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Work started] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-19 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-7252 started by Ahmed Hussein.

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-20 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7252:
-
Attachment: MAPREDUCE-7252.001.patch

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-20 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7252:
-
Status: Patch Available  (was: In Progress)

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-23 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.009.patch

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-23 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022727#comment-17022727
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

There is an open Jira MAPREDUCE-7259 to fix 
{{TestSpeculativeExecutionWithMRApp}}

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-23 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022662#comment-17022662
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

I found a work around to prevent the JUnit from hanging. I added that change to 
[^MAPREDUCE-7079.009.patch]

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024627#comment-17024627
 ] 

Ahmed Hussein commented on MAPREDUCE-7262:
--

{quote}waitForState(TaskAttempt attempt, TaskAttemptState...finalStates) was 
not updated. I assume this is because we assume MAPREDUCE-7259 will remove this 
function. Can you confirm? If it turns out MAPREDUCE-7259 will continue to use 
this, it will need to be updated.{quote}
Yes, MAPREDUCE-7259 will remove the method.

{quote}public void waitForState(Task task, TaskState finalState)
Seems to have a few problems. One is that it continues to use 
System.out.println. Can you comment on that?.t{quote}
My bad..I did not remove that print statement. I will update the patch.

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> 

[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-27 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024547#comment-17024547
 ] 

Ahmed Hussein commented on MAPREDUCE-7259:
--

[~jeagles], I moved the optimizations of the MRApp methods into a different 
Jira (MAPREDUCE-7262). Changing the intervals of polling the App/Task status 
reduces the elapsed time of test cases significantly.

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, 
> MAPREDUCE-7259.003.patch, MAPREDUCE-7259.004.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: MAPREDUCE-7261.002.patch

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.002.patch, 
> MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 
> 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: MAPREDUCE-7262-elapsedTimes.pdf

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7262-elapsedTimes.pdf
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: (was: MAPREDUCE-7261.002.patch)

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7262-elapsedTimes.pdf, 
> MAPREDUCE-7262.001.patch, MAPREDUCE-7262.002.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Status: Open  (was: Patch Available)

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.002.patch, 
> MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch, 
> MAPREDUCE-7262.002.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: MAPREDUCE-7262.002.patch

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.002.patch, 
> MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch, 
> MAPREDUCE-7262.002.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Status: Patch Available  (was: Open)

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.002.patch, 
> MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch, 
> MAPREDUCE-7262.002.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 

[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259.005.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, 
> MAPREDUCE-7259.003.patch, MAPREDUCE-7259.004.patch, MAPREDUCE-7259.005.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: MAPREDUCE-7262.001.patch

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> 

[jira] [Commented] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024502#comment-17024502
 ] 

Ahmed Hussein commented on MAPREDUCE-7262:
--

The table in MAPREDUCE-7262-elapsedTimes.pdf shows the ratio between elapsed 
times of the affected JUnit tests before and after applying the patch.
Some test cases like {{TestRecovery}} and {{TestMRApp}} execution times are 
reduce by 50%.

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 

[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Attachment: MAPREDUCE-7262-branch-2.10.002.patch

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7262-branch-2.10.002.patch, 
> MAPREDUCE-7262-elapsedTimes.pdf, MAPREDUCE-7262.001.patch, 
> MAPREDUCE-7262.002.patch
>
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> 

[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-24 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023063#comment-17023063
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

The work around to avoid the entropy problem on linux machines is to pass JVM 
option "{{-Djava.security.egd=file:/dev/./urandom}}" to the MRAppMaster and to 
the YarnChild processes. This could be achieved by setting the "{{JVM_OPTS}}" 
in {{mapred-default.xml}} but this will be invasive solution.
I opted to append the required configurations to the cluster configuration. The 
cluster configuration is passed to the Job which in turn passes it to the task 
processes.

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-24 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023240#comment-17023240
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

I believe so. The console of the the last patch submissions shows that the test 
case passed in 1.5 minutes.

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-24 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Summary: MRApp helpers block for long intervals (500ms)  (was: MRApp 
helpers blocks for long intervals)

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> 

[jira] [Created] (MAPREDUCE-7262) MRApp helpers blocks for long intervals

2020-01-24 Thread Ahmed Hussein (Jira)
Ahmed Hussein created MAPREDUCE-7262:


 Summary: MRApp helpers blocks for long intervals
 Key: MAPREDUCE-7262
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


MRApp has a set of methods used as helpers in test cases such as: 
{{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc

When the condition fails, the thread sleeps for a minimum of 500ms before 
rechecking the new state of the Job/TA.
Example:


{code:java}
  public void waitForState(Task task, TaskState finalState) throws Exception {
int timeoutSecs = 0;
TaskReport report = task.getReport();
while (!finalState.equals(report.getTaskState()) &&
timeoutSecs++ < 20) {
  System.out.println("Task State for " + task.getID() + " is : "
  + report.getTaskState() + " Waiting for state : " + finalState
  + "   progress : " + report.getProgress());
  report = task.getReport();
  Thread.sleep(500);
}
System.out.println("Task State is : " + report.getTaskState());
Assert.assertEquals("Task state is not correct (timedout)", finalState,
report.getTaskState());
  }
{code}

I suggest to reduce the interval 500 to 50, while incrementing the number of 
retries to 200. this will potentially make the test cases run faster. Also, the 
{{System.out}} calls need to be removed because they are not adding information 
dumping the current state on every iteration.

A tentative list of Junits affected by the change:


{code:bash}
Method
waitForInternalState(JobImpl, JobStateInternal)
Found usages  (12 usages found)
org.apache.hadoop.mapreduce.v2.app  (10 usages found)
TestJobEndNotifier  (3 usages found)
testNotificationOnLastRetry(boolean)  (1 usage found)
214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
usage found)
256 app.waitForInternalState(job, JobStateInternal.REBOOT);
testNotificationOnLastRetryUnregistrationFailure()  (1 usage found)
289 app.waitForInternalState(job, JobStateInternal.REBOOT);
TestKill  (5 usages found)
testKillJob()  (1 usage found)
70 app.waitForInternalState((JobImpl) job, 
JobStateInternal.RUNNING);
testKillTask()  (1 usage found)
108 app.waitForInternalState((JobImpl) job, 
JobStateInternal.RUNNING);
testKillTaskWait()  (1 usage found)
219 app.waitForInternalState((JobImpl) job, 
JobStateInternal.KILLED);
testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
266 app.waitForInternalState((JobImpl)job, 
JobStateInternal.KILLED);
testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
316 app.waitForInternalState((JobImpl)job, 
JobStateInternal.KILLED);
TestMRApp  (2 usages found)
testJobSuccess()  (1 usage found)
494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
542 app.waitForInternalState((JobImpl) job, 
JobStateInternal.REBOOT);
org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
TestRMContainerAllocator  (2 usages found)
testReportedAppProgress()  (1 usage found)
1050 mrApp.waitForInternalState((JobImpl) job, 
JobStateInternal.RUNNING);
testReportedAppProgressWithOnlyMaps()  (1 usage found)
1202 mrApp.waitForInternalState((JobImpl)job, 
JobStateInternal.RUNNING);


--

Method
waitForState(TaskAttempt, TaskAttemptState)
Found usages  (72 usages found)
org.apache.hadoop.mapreduce.v2  (2 usages found)
TestSpeculativeExecutionWithMRApp  (2 usages found)
testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
212 app.waitForState(taskAttempt.getValue(), 
TaskAttemptState.SUCCEEDED);
testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
275 app.waitForState(taskAttempt.getValue(), 
TaskAttemptState.SUCCEEDED);
org.apache.hadoop.mapreduce.v2.app  (67 usages found)
TestAMInfos  (1 usage found)
testAMInfosWithoutRecoveryEnabled()  (1 usage found)
58 app.waitForState(taskAttempt, TaskAttemptState.RUNNING);
TestFetchFailure  (11 usages found)
testFetchFailure()  (3 usages found)
77 app.waitForState(mapAttempt1, TaskAttemptState.RUNNING);
109 app.waitForState(reduceAttempt, TaskAttemptState.RUNNING);
  

[jira] [Updated] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-24 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.010.patch

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-24 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023318#comment-17023318
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

The test case {{TestMRIntermediateDataEncryption}} hangs forever for lack of 
entropy.
The problem was reported in another Jira MAPREDUCE-7099.
Since this test case causes failure of an entire module and causes other Junits 
to run OOM, the workaround in this patch is to eliminate the problem 
temporarily aiming at stabilizing Yetus.
The ideal fix is to increase the entropy of the linux box which is submitted as 
a separate Jira HADOOP-16810.

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-28 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259-branch-2.10.005.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7259-branch-2.10.005.patch, 
> MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, MAPREDUCE-7259.003.patch, 
> MAPREDUCE-7259.004.patch, MAPREDUCE-7259.005.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-28 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025417#comment-17025417
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

Thanks [~epayne]!

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch, 
> MAPREDUCE-7079.009.patch, MAPREDUCE-7079.010.patch
>
>
> {{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. 
> The problem is due to incorrect while condition that will evaluate to false 
> on the iteration of the loop.
> {code:java}
>  scheduledExecutor.shutdown();
>   boolean interrupted = false;
>   long currentTime = System.currentTimeMillis();
>   while (!scheduledExecutor.isShutdown()
>   && System.currentTimeMillis() > currentTime + 1000l && 
> !interrupted) {
> try {
>   Thread.sleep(20);
> } catch (InterruptedException e) {
>   interrupted = true;
> }
>   }
> {code}
> The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is 
> false because currentTime was just initialized with 
> {{System.currentTimeMillis()}}. As a result the the thread won't wait until 
> the executor is terminated. Instead, it will force a shutdown immediately.
> *TestMRIntermediateDataEncryption is failing in precommit builds*
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7262) MRApp helpers block for long intervals (500ms)

2020-01-27 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7262:
-
Status: Patch Available  (was: Open)

> MRApp helpers block for long intervals (500ms)
> --
>
> Key: MAPREDUCE-7262
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7262
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> MRApp has a set of methods used as helpers in test cases such as: 
> {{waitForInternalState(TA)}}, {{waitForState(TA)}}, {{waitForState(Job)}}..etc
> When the condition fails, the thread sleeps for a minimum of 500ms before 
> rechecking the new state of the Job/TA.
> Example:
> {code:java}
>   public void waitForState(Task task, TaskState finalState) throws Exception {
> int timeoutSecs = 0;
> TaskReport report = task.getReport();
> while (!finalState.equals(report.getTaskState()) &&
> timeoutSecs++ < 20) {
>   System.out.println("Task State for " + task.getID() + " is : "
>   + report.getTaskState() + " Waiting for state : " + finalState
>   + "   progress : " + report.getProgress());
>   report = task.getReport();
>   Thread.sleep(500);
> }
> System.out.println("Task State is : " + report.getTaskState());
> Assert.assertEquals("Task state is not correct (timedout)", finalState,
> report.getTaskState());
>   }
> {code}
> I suggest to reduce the interval 500 to 50, while incrementing the number of 
> retries to 200. this will potentially make the test cases run faster. Also, 
> the {{System.out}} calls need to be removed because they are not adding 
> information dumping the current state on every iteration.
> A tentative list of Junits affected by the change:
> {code:bash}
> Method
> waitForInternalState(JobImpl, JobStateInternal)
> Found usages  (12 usages found)
> org.apache.hadoop.mapreduce.v2.app  (10 usages found)
> TestJobEndNotifier  (3 usages found)
> testNotificationOnLastRetry(boolean)  (1 usage found)
> 214 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testAbsentNotificationOnNotLastRetryUnregistrationFailure()  (1 
> usage found)
> 256 app.waitForInternalState(job, JobStateInternal.REBOOT);
> testNotificationOnLastRetryUnregistrationFailure()  (1 usage 
> found)
> 289 app.waitForInternalState(job, JobStateInternal.REBOOT);
> TestKill  (5 usages found)
> testKillJob()  (1 usage found)
> 70 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTask()  (1 usage found)
> 108 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testKillTaskWait()  (1 usage found)
> 219 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobAfterTA_DONE()  (1 usage found)
> 266 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> testKillTaskWaitKillJobBeforeTA_DONE()  (1 usage found)
> 316 app.waitForInternalState((JobImpl)job, 
> JobStateInternal.KILLED);
> TestMRApp  (2 usages found)
> testJobSuccess()  (1 usage found)
> 494 app.waitForInternalState(job, JobStateInternal.SUCCEEDED);
> testJobRebootOnLastRetryOnUnregistrationFailure()  (1 usage found)
> 542 app.waitForInternalState((JobImpl) job, 
> JobStateInternal.REBOOT);
> org.apache.hadoop.mapreduce.v2.app.rm  (2 usages found)
> TestRMContainerAllocator  (2 usages found)
> testReportedAppProgress()  (1 usage found)
> 1050 mrApp.waitForInternalState((JobImpl) job, 
> JobStateInternal.RUNNING);
> testReportedAppProgressWithOnlyMaps()  (1 usage found)
> 1202 mrApp.waitForInternalState((JobImpl)job, 
> JobStateInternal.RUNNING);
> --
> Method
> waitForState(TaskAttempt, TaskAttemptState)
> Found usages  (72 usages found)
> org.apache.hadoop.mapreduce.v2  (2 usages found)
> TestSpeculativeExecutionWithMRApp  (2 usages found)
> testSpeculateSuccessfulWithoutUpdateEvents()  (1 usage found)
> 212 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> testSpeculateSuccessfulWithUpdateEvents()  (1 usage found)
> 275 app.waitForState(taskAttempt.getValue(), 
> TaskAttemptState.SUCCEEDED);
> org.apache.hadoop.mapreduce.v2.app  (67 usages found)
> TestAMInfos  (1 usage 

[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013234#comment-17013234
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

This test case has been failing for ever.
* When it timeout, {{MRAppMaster}} and some {{YarnChild}} processes remain 
running in the background. Therefore, the JVM running the tests fail due to 
OOM. No one notices that this unit test case has failed because the QA reports 
the unit tests that failed, but not timeout.
* It works for Mac OS X, but never works for Linux running on a virtual Box. It 
only works on the latter by disabling 
{{MRJobConfig.MR_ENCRYPTED_INTERMEDIATE_DATA}}.

Going back to the commit that added the JUnit test, the test fails as well. 
So, in order to avoid the bogus QA reports caused by this JUnit, I am going to 
disable it until I figure out whether linux crashes because of entropy issues 
on Virtual Machines.

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, MAPREDUCE-7079.003.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.004.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17013326#comment-17013326
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

The test case was failing on virtual machines running Linux due to lack of 
entropy and randomness of Linux VM.

I got it to work successfully by installing {{haveged}} and {{rng-tools}} on 
the virtual machine running Rel7.
Then, I started ringed service {{sudo service rngd start}}.

[~ste...@apache.org], is it possible to install those packages on the linux 
image running the pre-commits?
Can you also please take a look at the patch? I made also some changes in the 
test case to reduce the overhead of cluster creation.


> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.005.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2020-01-09 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012011#comment-17012011
 ] 

Ahmed Hussein commented on MAPREDUCE-7252:
--

Thanks [~jeagles] !

{\{TestMiniMRWithDFSWithDistinctUsers}} seems to fail intermittently.

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: MAPREDUCE-7252-branch-2.10.003.patch, 
> MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-15 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016132#comment-17016132
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

[~epayne], or [~ebadger] Can anyone please take a look at that patch? It 
reduces the overhead of the test case.

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016975#comment-17016975
 ] 

Ahmed Hussein commented on MAPREDUCE-7079:
--

{{1000l}} is {{1000}} followed by "{{l}}". I prefer to use capital "L" to avoid 
this confusion. I saw this in different places in the code.

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016999#comment-17016999
 ] 

Ahmed Hussein commented on MAPREDUCE-7259:
--

So far the problem seems to happen when the speculator speculate more than one 
task leading to longer time in terminating the MRApp.
For now, I skip the timeout error when running the exponential estimator.

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Status: Patch Available  (was: Open)

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)
Ahmed Hussein created MAPREDUCE-7259:


 Summary: testSpeculateSuccessfulWithUpdateEvents fails 
Intermittently  
 Key: MAPREDUCE-7259
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


{{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
fails Intermittently with the exponential estimator. The problem happens 
because assertion fails waiting for the MRApp to stop.
There maybe a need to redesign the test case because it does not work very well 
because of the racing and the timing between the speculator and the tasks. It 
works fine for the legacy estimator because the estimate is based on start-end 
rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259.001.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017056#comment-17017056
 ] 

Ahmed Hussein commented on MAPREDUCE-7259:
--

The exception error:


{code:bash}
---
Test set: org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp
---
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 207.595 s <<< 
FAILURE! - in org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp
testSpeculateSuccessfulWithUpdateEvents[0: TaskEstimator(EstimatorClass class 
org.apache.hadoop.mapreduce.v2.app.speculate.SimpleExponentialTaskRuntimeEstimator)](org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp)
  Time elapsed: 165.266 s  <<< FAILURE!
java.lang.AssertionError: Timeout while waiting for MRApp to stop
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.apache.hadoop.mapreduce.v2.app.MRApp.waitForState(MRApp.java:455)
at 
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents(TestSpeculativeExecutionWithMRApp.java:345)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}


> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7261) Memory efficiency in speculator

2020-01-22 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7261:
-
Attachment: MAPREDUCE-7261.001.patch

> Memory efficiency in speculator 
> 
>
> Key: MAPREDUCE-7261
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7261
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.001.patch
>
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7261) Memory efficiency in speculator

2020-01-22 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7261:
-
Status: Patch Available  (was: Open)

> Memory efficiency in speculator 
> 
>
> Key: MAPREDUCE-7261
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7261
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-22 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259.004.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, 
> MAPREDUCE-7259.003.patch, MAPREDUCE-7259.004.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7261) Memory efficiency in speculator

2020-01-22 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7261:
-
Attachment: MAPREDUCE-7261.002.patch

> Memory efficiency in speculator 
> 
>
> Key: MAPREDUCE-7261
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7261
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7261.001.patch, MAPREDUCE-7261.002.patch
>
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) JobHistory#ServiceStop implementation is incorrect

2020-01-22 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Description: 
{{JobHistory.serviceStop}} skips waiting for the thread pool to terminate. The 
problem is due to incorrect while condition that will evaluate to false on the 
iteration of the loop.

{code:java}
 scheduledExecutor.shutdown();
  boolean interrupted = false;
  long currentTime = System.currentTimeMillis();
  while (!scheduledExecutor.isShutdown()
  && System.currentTimeMillis() > currentTime + 1000l && !interrupted) {
try {
  Thread.sleep(20);
} catch (InterruptedException e) {
  interrupted = true;
}
  }
{code}

The expression "{{System.currentTimeMillis() > currentTime + 1000L}}" is false 
because currentTime was just initialized with {{System.currentTimeMillis()}}. 
As a result the the thread won't wait until the executor is terminated. 
Instead, it will force a shutdown immediately.

*TestMRIntermediateDataEncryption is failing in precommit builds*

TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
which causes the unit tests in jobclient to not pass cleanly during precommit 
builds. From sample precommit console output, note the lack of a test results 
line when the test is run:
{noformat}
[INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 s 
- in org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
[INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 s 
- in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[...]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:14 h
[INFO] Finished at: 2018-04-12T04:27:06+00:00
[INFO] Final Memory: 24M/594M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}


  was:
TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
which causes the unit tests in jobclient to not pass cleanly during precommit 
builds. From sample precommit console output, note the lack of a test results 
line when the test is run:
{noformat}
[INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 s 
- in org.apache.hadoop.mapred.TestSequenceFileInputFormat
[INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
[INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 s 
- in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
[...]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 02:14 h
[INFO] Finished at: 2018-04-12T04:27:06+00:00
[INFO] Final Memory: 24M/594M
[INFO] 
[WARNING] The requested profile "parallel-tests" could not be activated because 
it does not exist.
[WARNING] The requested profile "native" could not be activated because it does 
not exist.
[WARNING] The requested profile "yarn-ui" could not be activated because it 
does not exist.
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{noformat}


Summary: JobHistory#ServiceStop implementation is incorrect  (was: 
TestMRIntermediateDataEncryption is failing in precommit builds)

> JobHistory#ServiceStop implementation is incorrect
> --
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>

[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Status: Patch Available  (was: Open)

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.007.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Status: Open  (was: Patch Available)

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.006.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-17 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.008.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch, MAPREDUCE-7079.004.patch, MAPREDUCE-7079.005.patch, 
> MAPREDUCE-7079.006.patch, MAPREDUCE-7079.007.patch, MAPREDUCE-7079.008.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-21 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Status: Patch Available  (was: Open)

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-21 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Status: Open  (was: Patch Available)

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-21 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259.003.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch, 
> MAPREDUCE-7259.003.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-21 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020282#comment-17020282
 ] 

Ahmed Hussein commented on MAPREDUCE-7259:
--

The thread stack when the app JUnit times out is the following:

 
{code:bash}
"CommitterEvent Processor #1"  prio=5 tid=42 in Object.wait()
java.lang.Thread.State: WAITING (on object monitor)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"IPC Server Responder" daemon prio=5 tid=25 runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1493)
at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1476)
"CommitterEvent Handler" daemon prio=5 tid=39 in Object.wait()
java.lang.Thread.State: WAITING (on object monitor)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$2.run(CommitterEventHandler.java:145)
at java.lang.Thread.run(Thread.java:748)
"Ping Checker for TaskAttemptFinishingMonitor" daemon prio=5 tid=38 
timed_waiting
java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor$PingChecker.run(AbstractLivelinessMonitor.java:155)
at java.lang.Thread.run(Thread.java:748)
"qtp222915858-32" daemon prio=5 tid=32 runnable
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
at 
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:466)
at 
org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
at 
org.eclipse.jetty.io.ManagedSelector$$Lambda$39/1289807664.run(Unknown Source)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)
at java.lang.Thread.run(Thread.java:748)
"Listener at 0.0.0.0/51356" daemon prio=5 tid=13 runnable
java.lang.Thread.State: RUNNABLE
at java.lang.Thread.dumpThreads(Native Method)
at java.lang.Thread.getAllStackTraces(Thread.java:1610)
at 
org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87)
at 
org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73)
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:398)
at 
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp.waitForAppStop(TestSpeculativeExecutionWithMRApp.java:384)
at 
org.apache.hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents(TestSpeculativeExecutionWithMRApp.java:327)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Updated] (MAPREDUCE-7259) testSpeculateSuccessfulWithUpdateEvents fails Intermittently

2020-01-21 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7259:
-
Attachment: MAPREDUCE-7259.002.patch

> testSpeculateSuccessfulWithUpdateEvents fails Intermittently  
> --
>
> Key: MAPREDUCE-7259
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7259
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7259.001.patch, MAPREDUCE-7259.002.patch
>
>
> {{TestSpeculativeExecutionWithMRApp.testSpeculateSuccessfulWithUpdateEvents}} 
> fails Intermittently with the exponential estimator. The problem happens 
> because assertion fails waiting for the MRApp to stop.
> There maybe a need to redesign the test case because it does not work very 
> well because of the racing and the timing between the speculator and the 
> tasks. It works fine for the legacy estimator because the estimate is based 
> on start-end rate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Moved] (MAPREDUCE-7261) Memory efficiency in speculator

2020-01-21 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein moved YARN-9597 to MAPREDUCE-7261:


Key: MAPREDUCE-7261  (was: YARN-9597)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Memory efficiency in speculator 
> 
>
> Key: MAPREDUCE-7261
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7261
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
>
> The data structures in speculator and runtime-estimator are bloating. Data 
> elements such as (taskID, TA-ID, task stats, tasks speculated, tasks 
> finished..etc) are added to the concurrent maps but never removed.
> For long running jobs, there are couple of issues:
>  # memory leakage: the speculator memory usage increases over time. 
>  # performance: keeping large structures in the heap affects the performance 
> due to locality and cache misses.
> *Suggested Fixes:*
> - When a TA transitions to {{MoveContainerToSucceededFinishingTransition}}, 
> the TA notifies the speculator. The latter handles the event by cleaning the 
> internal structure accordingly.
> - When a task transitions is failed/killed, the speculator is notified to 
> clean the internal data structure.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-08 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.003.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, 
> MAPREDUCE-7079.003.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2020-01-08 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7252:
-
Attachment: MAPREDUCE-7252-branch-2.10.003.patch

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7252-branch-2.10.003.patch, 
> MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: 
2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt
2020-01-10-MRApp-stack-dump.txt

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: 2020-01-10-MRApp-stack-dump.txt, 
> 2020-01-10-org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-version-14.txt,
>  MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch, MAPREDUCE-7079.003.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2020-01-02 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006893#comment-17006893
 ] 

Ahmed Hussein commented on MAPREDUCE-7252:
--

[~jeagles] the error is not related to the current patch. It is an old bug that 
causes the tests to be flaky.

See the pending Jira MAPREDUCE-7099. I am still investigating when did that bug 
starts to break the tests.
{code:bash}
[WARNING] Tests run: 568, Failures: 0, Errors: 0, Skipped: 10
[INFO]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  02:00 h
[INFO] Finished at: 2020-01-01T00:06:16Z
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, 
> MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7099) Daily test result fails in MapReduce JobClient though there isn't any error

2020-01-02 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7099:
-

This issue is caused by 
`org.apache.hadoop.mapred.TestMRIntermediateDataEncryption` taking so long. 
There was an old Jira filed MAPREDUCE-7079, but it seems it went unnoticed.

I think the reason that this bug has been there forever is that Maven surefire 
timeout treated as successful build. That of course depends on the 
configurations of the pre-commit builds. 

> Daily test result fails in MapReduce JobClient though there isn't any error
> ---
>
> Key: MAPREDUCE-7099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7099
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Critical
>
> Looks like the test result in MapReduce JobClient always fails lately. Please 
> see the results of hadoop-qbt-trunk-java8-linux-x86:
>  
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/]/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
> {noformat}
> [INFO] Results:
> [INFO] 
> [WARNING] Tests run: 565, Failures: 0, Errors: 0, Skipped: 10
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:06 h
> [INFO] Finished at: 2018-05-30T12:32:39+00:00
> [INFO] Final Memory: 25M/645M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "shelltest" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-03 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reassigned MAPREDUCE-7079:


Assignee: Ahmed Hussein

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-30 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7252:
-
Attachment: MAPREDUCE-7252.003.patch

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, 
> MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7099) Daily test result fails in MapReduce JobClient though there isn't any error

2019-12-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006286#comment-17006286
 ] 

Ahmed Hussein commented on MAPREDUCE-7099:
--

[~ste...@apache.org] I am reopening this issue. This bug is still alive. 

I can reproduce it locally on my server running GNU/Linux as well but I cannot 
find yet where the timeout comes from.


{code:bash}
[WARNING] Tests run: 568, Failures: 0, Errors: 0, Skipped: 10
[INFO]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  02:00 h
[INFO] Finished at: 2020-01-01T00:06:16Z
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}


> Daily test result fails in MapReduce JobClient though there isn't any error
> ---
>
> Key: MAPREDUCE-7099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7099
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Critical
>
> Looks like the test result in MapReduce JobClient always fails lately. Please 
> see the results of hadoop-qbt-trunk-java8-linux-x86:
>  
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/]/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
> {noformat}
> [INFO] Results:
> [INFO] 
> [WARNING] Tests run: 565, Failures: 0, Errors: 0, Skipped: 10
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:06 h
> [INFO] Finished at: 2018-05-30T12:32:39+00:00
> [INFO] Final Memory: 25M/645M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "shelltest" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-7099) Daily test result fails in MapReduce JobClient though there isn't any error

2019-12-31 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reopened MAPREDUCE-7099:
--

> Daily test result fails in MapReduce JobClient though there isn't any error
> ---
>
> Key: MAPREDUCE-7099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7099
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Critical
>
> Looks like the test result in MapReduce JobClient always fails lately. Please 
> see the results of hadoop-qbt-trunk-java8-linux-x86:
>  
> [https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/]/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
> {noformat}
> [INFO] Results:
> [INFO] 
> [WARNING] Tests run: 565, Failures: 0, Errors: 0, Skipped: 10
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:06 h
> [INFO] Finished at: 2018-05-30T12:32:39+00:00
> [INFO] Final Memory: 25M/645M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "shelltest" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7231) hadoop-mapreduce-client-jobclient fails with timeout

2019-12-31 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein resolved MAPREDUCE-7231.
--
Resolution: Duplicate

Marking this issue as a duplicate of MAPREDUCE-7099. The trunk shows timeout 
errors.

> hadoop-mapreduce-client-jobclient fails with timeout
> 
>
> Key: MAPREDUCE-7231
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7231
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Maven_TestCase_Report.txt
>
>
> hadoop-mapreduce-client-jobclient fails with timeout
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) 
> on project hadoop-mapreduce-client-jobclient: There was a timeout or other 
> error in the fork -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-12-31 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006240#comment-17006240
 ] 

Ahmed Hussein commented on MAPREDUCE-7169:
--

[~BilwaST], can you also add a configuration to enable/disable your code 
changes?
My intuition is that changing the policy to pick the node for the speculative 
task will inherently change the efficiency of the speculation.
For example, picking a different node may increase the startup time of the 
speculative task. This implies change of the speculation efficiency compared to 
the legacy behavior. Thus, I suggest to give the option for the  user to 
enable/disable the new policy in case she prefers to evaluate the new behavior 
and revert back to the legacy one if necessary.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2019-12-30 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7252:
-
Attachment: MAPREDUCE-7252.002.patch

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-12-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005468#comment-17005468
 ] 

Ahmed Hussein edited comment on MAPREDUCE-7169 at 12/30/19 4:58 PM:


[~BilwaST], the patch is not applicable anymore with the trunk. Can you please 
fix the compilation errors along with the java doc and checkstyle issues?

I see that you add the node hosting the original task to the blacklist of the 
speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.


was (Author: ahussein):
[~BilwaST], the patch is not applicable anymore with the trunk. Can you please 
fix the compilation errors along with the java doc and checkstyle issues?

 

I see that you add the node hosting the original task to the blacklist of the 
speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-12-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005468#comment-17005468
 ] 

Ahmed Hussein commented on MAPREDUCE-7169:
--

[~BilwaST], the patch is not applicable anymore with the trunk. Can you please 
fix the compilation errors along with the java doc and checkstyle issues?

 

I see that you add the node hosting the original task to the blacklist of the 
speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-12-30 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005468#comment-17005468
 ] 

Ahmed Hussein edited comment on MAPREDUCE-7169 at 12/30/19 5:16 PM:


[~BilwaST], I am sorry but the patch is not applicable anymore with the trunk. 
See YARN-9052 that causes the conflict. Can you please fix the compilation 
errors along with the java doc and checkstyle issues?

I see that you add the node hosting the original task to the blacklist of the 
speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.


was (Author: ahussein):
[~BilwaST], the patch is not applicable anymore with the trunk. Can you please 
fix the compilation errors along with the java doc and checkstyle issues?

I see that you add the node hosting the original task to the blacklist of the 
speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2020-01-06 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009025#comment-17009025
 ] 

Ahmed Hussein commented on MAPREDUCE-7252:
--

The errors in the pre-commit build are caused by MAPREDUCE-7079 : 
"TestMRIntermediateDataEncryption is failing in precommit builds" which is an 
old Jira that had not been solved yet.

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, 
> MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-06 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.001.patch
Status: Patch Available  (was: Open)

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7079.001.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7079) TestMRIntermediateDataEncryption is failing in precommit builds

2020-01-06 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7079:
-
Attachment: MAPREDUCE-7079.002.patch

> TestMRIntermediateDataEncryption is failing in precommit builds
> ---
>
> Key: MAPREDUCE-7079
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7079
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jason Darrell Lowe
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7079.001.patch, MAPREDUCE-7079.002.patch
>
>
> TestMRIntermediateDataEncryption is either timing out or tearing down the JVM 
> which causes the unit tests in jobclient to not pass cleanly during precommit 
> builds. From sample precommit console output, note the lack of a test results 
> line when the test is run:
> {noformat}
> [INFO] Running org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.976 
> s - in org.apache.hadoop.mapred.TestSequenceFileInputFormat
> [INFO] Running org.apache.hadoop.mapred.TestMRIntermediateDataEncryption
> [INFO] Running org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.659 
> s - in org.apache.hadoop.mapred.TestSpecialCharactersInOutputPath
> [...]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 02:14 h
> [INFO] Finished at: 2018-04-12T04:27:06+00:00
> [INFO] Final Memory: 24M/594M
> [INFO] 
> 
> [WARNING] The requested profile "parallel-tests" could not be activated 
> because it does not exist.
> [WARNING] The requested profile "native" could not be activated because it 
> does not exist.
> [WARNING] The requested profile "yarn-ui" could not be activated because it 
> does not exist.
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
> project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
> in the fork -> [Help 1]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7252) Handling 0 progress in SimpleExponential task runtime estimator

2020-01-07 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009930#comment-17009930
 ] 

Ahmed Hussein commented on MAPREDUCE-7252:
--

Thanks [~jeagles]!

> Handling 0 progress in SimpleExponential task runtime estimator
> ---
>
> Key: MAPREDUCE-7252
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7252
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: MAPREDUCE-7252.001.patch, MAPREDUCE-7252.002.patch, 
> MAPREDUCE-7252.003.patch
>
>
> The simple exponential runtime estimator (added in MAPREDUCE-7208) seems not 
> to handle the corner cases where the delta progress is 0. As a result, the 
> forecast will be NaN or Inf which messes up the subsequent forecast values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272.001.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
> two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Status: Patch Available  (was: Open)

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
> two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Description: 
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
+ taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if the 
listener will log the progress every 3 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold. Default is 1 
minute.

The listener will long whichever of {{delta.threshold}} and {{wait.delta.time}} 
is reached first. 
   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.

  was:
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
+ taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if the 
listener will log the progress every 3 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold.

The listener will long whichever of {{delta.threshold}} and {{wait.delta.time}} 
is reached first. 
  Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.


> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: 

[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-09 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272-branch-2.10.001.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272-branch-2.10.002.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272.002.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272.001.patch, MAPREDUCE-7272.002.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Status: Patch Available  (was: Open)

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Description: 
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
+ taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
the listener will log the progress every 2 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold. Default is 1 
minute.

The listener will long whichever of {{delta.threshold}} and 
{{wait.interval-seconds}} is reached first. 
   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.

  was:
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
+ taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if the 
listener will log the progress every 3 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold. Default is 1 
minute.

The listener will long whichever of {{delta.threshold}} and {{wait.delta.time}} 
is reached first. 
   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.


> TaskAttemptListenerImpl excessive log messages
> --
>
> 

[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Status: Open  (was: Patch Available)

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272-branch-2.10.003.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-13 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-
Attachment: MAPREDUCE-7272.004.patch

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch, MAPREDUCE-7272.004.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7272) TaskAttemptListenerImpl excessive log messages

2020-04-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080965#comment-17080965
 ] 

Ahmed Hussein commented on MAPREDUCE-7272:
--

Thanks [~epayne] for the feedback.

I fixed the problems reported by findbugs and uploaded the patches 
[^MAPREDUCE-7272.003.patch] and [^MAPREDUCE-7272-branch-2.10.004.patch]

> TaskAttemptListenerImpl excessive log messages
> --
>
> Key: MAPREDUCE-7272
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: MAPREDUCE-7272-branch-2.10.001.patch, 
> MAPREDUCE-7272-branch-2.10.002.patch, MAPREDUCE-7272-branch-2.10.003.patch, 
> MAPREDUCE-7272-branch-2.10.004.patch, MAPREDUCE-7272.001.patch, 
> MAPREDUCE-7272.002.patch, MAPREDUCE-7272.003.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
> LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
> + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.log.progress.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.log.progress.wait.interval-seconds=120}}: means that if 
> the listener will log the progress every 2 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.interval-seconds}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



  1   2   >