[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-11-18 Thread zhao bo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976478#comment-16976478
 ] 

zhao bo commented on SPARK-28770:
-

https://github.com/apache/spark/pull/25673

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Assignee: Wing Yew Poon
>Priority: Major
> Fix For: 3.0.0
>
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-04 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922936#comment-16922936
 ] 

Jungtaek Lim commented on SPARK-28770:
--

Great! Thanks both of you for understanding.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-04 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922925#comment-16922925
 ] 

Wing Yew Poon commented on SPARK-28770:
---

As [~irashid] noted, following an offline discussion, we agree with [~kabhwan] 
that the existing implementation of EventMonster, extending 
EventLoggingListener, does not serve the purposes of ReplayListenerSuite. I 
have updated my PR to fix EventMonster instead of adding filtering logic in 
testApplicationReplay.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-04 Thread Imran Rashid (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922825#comment-16922825
 ] 

Imran Rashid commented on SPARK-28770:
--

Wing yew and I chatted about this offline a bit.  I follow his reasoning now.  
We also think [~kabhwan] has a good point.  I think the connection between 
EventLoggingListener and EventMonster is probably not so good, it is from long 
ago (eg. before there was a SparkFirehoseListener).  So he's gonna try to 
explore changing it do the right thing and update the PR (assuming no 
unexpected surprises)

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-04 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922777#comment-16922777
 ] 

Jungtaek Lim commented on SPARK-28770:
--

As I commented in PR, I'm not sure EventMonster fits with the purpose of 
testing - why we concern about the criteria in ReplayListenerSuite? 
ReplayListener is designed to read and post all events to its listeners. If 
EventMonster just buffers the events as it receives, it shouldn't happen, as 
EventLoggingListener will buffer the events exactly same as what it writes to 
the event log file.

It shows weird behavior or at least requires us to know about the details of 
EventLoggingListener because EventMonster extends EventLoggingListener and it 
doesn't buffer received events transparently.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-04 Thread Imran Rashid (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922771#comment-16922771
 ] 

Imran Rashid commented on SPARK-28770:
--

I'm not sure I follow ... the metrics from the driver will be in the log file 
if and only if {{park.eventLog.logStageExecutorMetrics.enabled=true}} and there 
is a stage completed event.  But so that same criteria should apply on replay, 
so the EventMonster would also "log" the metrics update (meaning, record it in 
its internal buffer) if and only if the event log file also contained the stage 
completed event.

[~bzhaoopenstack] since you can reproduce this consistently, could you share 
the event log file from a failure?  I think you just need to delete this block 
in ReplayListenerQuite:

{code}
  after {
Utils.deleteRecursively(testDir)
  }
{code}

and maybe print out that dir somewhere so you can grab the file.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-03 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921955#comment-16921955
 ] 

Wing Yew Poon commented on SPARK-28770:
---

My earlier analysis of why the test failure occurs was deficient.
The real reason is that ReplayListenerSuite.testApplicationReplay fails if the 
application runs long enough for the driver to send an executor metrics update. 
This causes stage executor metrics to be written for the driver. However, 
executor metrics updates are not logged, and thus not replayed. Therefore no 
stage executor metrics for the driver is logged on replay. In this scenario, 
there will be more events in the original than logged by the replay listener, 
and the events won't line up.
I was able to engineer this failure by reducing the heartbeat interval at which 
the driver sends executor metrics updates.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-03 Thread zhao bo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921224#comment-16921224
 ] 

zhao bo commented on SPARK-28770:
-

Hi guys,

Thank you very much for your concern. I had post an PR for this, 
[https://github.com/apache/spark/pull/25659] ,

Could you please to have a look? Is that fit your kind suggest? Thanks.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-09-01 Thread huangtianhua (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920587#comment-16920587
 ] 

huangtianhua commented on SPARK-28770:
--

[~wypoon], thank you for looking into this, so you suggest to delete the 
compare of SparkListenerStageExecutorMetrics events for the two failed tests? 

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-31 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920234#comment-16920234
 ] 

Wing Yew Poon commented on SPARK-28770:
---

I looked into the issue further. In EventLoggingListener, almost all calls to 
logEvent (to write serialized JSON to the event log) are as a direct result of 
an onXXX method being called. The exception is that within onStageCompleted, 
before calling logEvent with the SparkListenerStageCompleted event, if we are 
logging stage executor metrics, there is a bulk call to logEvent with 
SparkListenerStageExecutorMetrics events via a Map.foreach. This Map.foreach 
bulk operation may not log the events with the same order. This is also the 
only place where SparkListenerStageExecutorMetrics events get logged.
For this reason, I think the affected tests ("End-to-end replay" and 
"End-to-end replay with compression", both implemented by calling 
testApplicationReplay) should not compare the SparkListenerStageExecutorMetrics 
events. That should eliminate the indeterminacy of the tests.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-31 Thread Wing Yew Poon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920212#comment-16920212
 ] 

Wing Yew Poon commented on SPARK-28770:
---

On my branch from which [https://github.com/apache/spark/pull/23767] was merged 
into master, I modified ReplayListenerSuite following 
[https://gist.github.com/dwickern/6ba9c5c505d2325d3737ace059302922], and ran 
"End-to-end replay with compression" 100 times. I encountered no failures. I 
ran this on my MacBook Pro.
 The instance of failure that Jungtaek cited appears to be due to a comparison 
of two SparkListenerStageExecutorMetrics events (one from the original, the 
other from the replay) failing. One event came from the driver and the other 
came from executor "1". SparkListenerStageExecutorMetrics events are logged at 
stage completion if spark.eventLog.logStageExecutorMetrics.enabled is set to 
true. The failure could be due to these events being in a different order in 
the replay than in the original. 
 In the commit that first introduced these events, in ReplayListenerSuite, 
there was some code to filter out these events in the testApplicationReplay 
method of ReplayListenerSuite. (The code was to filter out the events from the 
original, not from the replay, which I didn't understand.) Maybe we could 
filter out the SparkListenerStageExecutorMetrics events (from both original and 
replay) in testApplicationReplay (which is called by "End-to-end replay" and 
"End-to-end replay with compression"), to avoid this flakiness.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-30 Thread zhao bo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920009#comment-16920009
 ] 

zhao bo commented on SPARK-28770:
-

Thanks Lim, yeah, we also found that most test jobs will pass towards these 
tests. For us, It's hard to say this issue is trully exist on X86, but on ARM, 
we failed every time.

 

Hope team can see what happened. What we do on ARM, is just revert the commit  
[https://github.com/apache/spark/pull/23767], then all pass.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-30 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919982#comment-16919982
 ] 

Jungtaek Lim commented on SPARK-28770:
--

Just hit again.

[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109965/testReport]

> we suspect there is something related with the commit 
>[https://github.com/apache/spark/pull/23767], we tried to revert it and the 
>tests are passed:

It's not occurred frequently, so you may need to run 100 times to make sure 
reverting would help.

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28770) Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression failed

2019-08-21 Thread zhao bo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912821#comment-16912821
 ] 

zhao bo commented on SPARK-28770:
-

Hi team, is there any update about this issue?

> Flaky Tests: Test ReplayListenerSuite.End-to-end replay with compression 
> failed
> ---
>
> Key: SPARK-28770
> URL: https://issues.apache.org/jira/browse/SPARK-28770
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: Community jenkins and our arm testing instance.
>Reporter: huangtianhua
>Priority: Major
>
> Test
> org.apache.spark.scheduler.ReplayListenerSuite.End-to-end replay with 
> compression is failed  see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/267/testReport/junit/org.apache.spark.scheduler/ReplayListenerSuite/End_to_end_replay_with_compression/]
>  
> And also the test is failed on arm instance, I sent email to spark-dev 
> before, and we suspect there is something related with the commit 
> [https://github.com/apache/spark/pull/23767], we tried to revert it and the 
> tests are passed:
> ReplayListenerSuite:
>        - ...
>        - End-to-end replay *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622)
>        - End-to-end replay with compression *** FAILED ***
>          "[driver]" did not equal "[1]" (JsonProtocolSuite.scala:622) 
>  
> Not sure what's wrong, hope someone can help to figure it out, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org