[jira] [Commented] (BEAM-3798) Performance tests flaky due to Dataflow transient errors
[ https://issues.apache.org/jira/browse/BEAM-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418517#comment-16418517 ] Chamikara Jayalath commented on BEAM-3798: -- Sounds good. Closing. > Performance tests flaky due to Dataflow transient errors > > > Key: BEAM-3798 > URL: https://issues.apache.org/jira/browse/BEAM-3798 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Łukasz Gajowy >Assignee: Łukasz Gajowy >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Performance tests are flaky due to transient errors that happened during data > processing (eg. SocketTimeoutException while connecting to DB). Currently > exceptions that happen on Dataflow runner but are retried successfully, fail > the test regardless of the final job state (giving a false-negative result). > Possible solution for batch scenarios: > We could "rethrow" exceptions that happened due to transient errors *only* if > the job status is other than DONE. > Possible solution for streaming scenarios: > (don't know yet) > [Link to discussion on dev list > |https://lists.apache.org/thread.html/e480f8181913dc81d2d4cd1430557a646537473ccf29fe6390229098@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3798) Performance tests flaky due to Dataflow transient errors
[ https://issues.apache.org/jira/browse/BEAM-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415503#comment-16415503 ] Łukasz Gajowy commented on BEAM-3798: - Should we close this issue? It was fixedby the submitted pr. Other problems with IOITs are addressed in separate issues. > Performance tests flaky due to Dataflow transient errors > > > Key: BEAM-3798 > URL: https://issues.apache.org/jira/browse/BEAM-3798 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Łukasz Gajowy >Assignee: Łukasz Gajowy >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Performance tests are flaky due to transient errors that happened during data > processing (eg. SocketTimeoutException while connecting to DB). Currently > exceptions that happen on Dataflow runner but are retried successfully, fail > the test regardless of the final job state (giving a false-negative result). > Possible solution for batch scenarios: > We could "rethrow" exceptions that happened due to transient errors *only* if > the job status is other than DONE. > Possible solution for streaming scenarios: > (don't know yet) > [Link to discussion on dev list > |https://lists.apache.org/thread.html/e480f8181913dc81d2d4cd1430557a646537473ccf29fe6390229098@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3798) Performance tests flaky due to Dataflow transient errors
[ https://issues.apache.org/jira/browse/BEAM-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390090#comment-16390090 ] Łukasz Gajowy commented on BEAM-3798: - I have access only to Jenkins logs. I'm not sure that Dataflow job ID is visible in Jenkins logs. I reproduced the issue many times on our own Dataflow project. Example Jenkins logs of job that seems to have this issue: [https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_JDBC/291/console] > Performance tests flaky due to Dataflow transient errors > > > Key: BEAM-3798 > URL: https://issues.apache.org/jira/browse/BEAM-3798 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Łukasz Gajowy >Assignee: Thomas Groh >Priority: Major > > Performance tests are flaky due to transient errors that happened during data > processing (eg. SocketTimeoutException while connecting to DB). Currently > exceptions that happen on Dataflow runner but are retried successfully, fail > the test regardless of the final job state (giving a false-negative result). > Possible solution for batch scenarios: > We could "rethrow" exceptions that happened due to transient errors *only* if > the job status is other than DONE. > Possible solution for streaming scenarios: > (don't know yet) > [Link to discussion on dev list > |https://lists.apache.org/thread.html/e480f8181913dc81d2d4cd1430557a646537473ccf29fe6390229098@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3798) Performance tests flaky due to Dataflow transient errors
[ https://issues.apache.org/jira/browse/BEAM-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390022#comment-16390022 ] Chamikara Jayalath commented on BEAM-3798: -- Do you have the Dataflow job ID of a job that passes with transient errors ? (couldn't find this from Jenkins logs) > Performance tests flaky due to Dataflow transient errors > > > Key: BEAM-3798 > URL: https://issues.apache.org/jira/browse/BEAM-3798 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Łukasz Gajowy >Assignee: Thomas Groh >Priority: Major > > Performance tests are flaky due to transient errors that happened during data > processing (eg. SocketTimeoutException while connecting to DB). Currently > exceptions that happen on Dataflow runner but are retried successfully, fail > the test regardless of the final job state (giving a false-negative result). > Possible solution for batch scenarios: > We could "rethrow" exceptions that happened due to transient errors *only* if > the job status is other than DONE. > Possible solution for streaming scenarios: > (don't know yet) > [Link to discussion on dev list > |https://lists.apache.org/thread.html/e480f8181913dc81d2d4cd1430557a646537473ccf29fe6390229098@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)