[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-11 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646845#comment-16646845
 ] 

Scott Wegner commented on BEAM-5467:


Thanks for the additional context. I'm not an expert on diagnosing memory 
issues, but here's what I can pull out of there:

* The build scan shows [some stats on memory 
usage|https://scans.gradle.com/s/f2u3q2obrgaqu/performance/build#memory], and 
for this build I see "PS Eden Space" of 1.36/1.36 GB (99.5%). I would deduce 
that the JVM ran out of allotted memory causing the segfault.
* The [infrastructure 
tab|https://scans.gradle.com/s/f2u3q2obrgaqu#infrastructure] shows the "Max JVM 
memory heap size" for the job: 3824 MB
* In the [timeline|https://scans.gradle.com/s/f2u3q2obrgaqu/timeline] I can see 
that the task that failed was 
{{:beam-sdks-python:flinkCompatibilityMatrixBatch}}. Nothing was running 
concurrently as part of the build, so either this task ate up the entire heap 
space, or some previous task is leaking memory.

My recommendation would be to work towards getting a local repro so that you 
can attach a memory profiler and validate potential fixes. The Jenkins job 
shows the full command-line used to launch the job, including JVM memory 
configuration:

{{gradlew --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g 
-Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:flinkCompatibilityMatrixBatch 
:beam-sdks-python:flinkCompatibilityMatrixStreaming}}

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-11 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646790#comment-16646790
 ] 

Thomas Weise commented on BEAM-5467:


[~swegner] here is an example. (It was necessary to download the full log to 
find the error message.)

[https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/305/consoleText]
{code:java}
# Thread: 
Segmentation fault (core dumped)

> Task :beam-sdks-python:flinkCompatibilityMatrixBatch FAILED

...


BUILD FAILED in 11m 6s
59 actionable tasks: 54 executed, 4 from cache, 1 up-to-date

Publishing build scan...
https://gradle.com/s/f2u3q2obrgaqu

Build step 'Invoke Gradle script' changed build result to FAILURE
Build step 'Invoke Gradle script' marked build as failure
Sending e-mails to: commits@beam.apache.org
Finished: FAILURE
{code}
 

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-11 Thread Scott Wegner (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646757#comment-16646757
 ] 

Scott Wegner commented on BEAM-5467:


[~thw] offhand this doesn't look familiar to me. Can you link to a Jenkin job 
run / Gradle build scan with more details?

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-05 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640413#comment-16640413
 ] 

Thomas Weise commented on BEAM-5467:


Yes, and as you had also noticed earlier, they fail in Jenkins frequently with:
Segmentation fault (core dumped)
[https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/]

[~swegner] do you have an idea what the cause could be or where to look for 
next level of detail? Perhaps memory settings?

 

 

 

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-10-05 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640350#comment-16640350
 ] 

Ankur Goenka commented on BEAM-5467:


The test pass consistently on local jenkins setup.

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-09-27 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631049#comment-16631049
 ] 

Ankur Goenka commented on BEAM-5467:


I verified that they get executed sequentially so that should not be a problem. 
 
:beam-sdks-python:flinkCompatibilityMatrixBatchFAILED
Started: 5m 27.699s
Duration: 4m 0.393s
 
:beam-sdks-python:flinkCompatibilityMatrixStreamingFAILED
Started: 9m 28.093s
Duration: 1m 13.280s

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-09-27 Thread Thomas Weise (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631010#comment-16631010
 ] 

Thomas Weise commented on BEAM-5467:


[~angoenka] should we try to turn off the parallel execution?

I also think we should move the following to a distinct task in 
sdks/python/build.gradle:
{code:java}
tasks(':beam-sdks-python:flinkCompatibilityMatrixBatch')
tasks(':beam-sdks-python:flinkCompatibilityMatrixStreaming'){code}
 

 

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

2018-09-27 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631002#comment-16631002
 ] 

Ankur Goenka commented on BEAM-5467:


Anecdotally, tasks are failing because of segfault with following error 
04:18:38 Segmentation fault (core dumped) 04:18:38 04:18:38 > Task 
:beam-sdks-python:flinkCompatibilityMatrixStreaming FAILED 04:18:38 
:beam-sdks-python:flinkCompatibilityMatrixStreaming (Thread[Task worker for ':' 
Thread 6,5,main]) completed. Took 1 mins 13.28 secs. 04:18:38 04:18:38 FAILURE: 
Build completed with 2 failures. 04:18:38 04:18:38 1: Task failed with an 
exception. 04:18:38 --- 04:18:38 * Where: 04:18:38 Build file 
'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_VR_Flink/src/sdks/python/build.gradle'
 line: 340 04:18:38 04:18:38 * What went wrong: 04:18:38 Execution failed for 
task ':beam-sdks-python:flinkCompatibilityMatrixBatch'. 04:18:38 > Process 
'command 'sh'' finished with non-zero exit value 139 04:18:38 04:18:38 * Try: 
04:18:38 Run with --stacktrace option to get the stack trace. Run with --debug 
option to get more log output. Run with --scan to get full insights. 04:18:38 
==

> Python Flink ValidatesRunner job fixes
> --
>
> Key: BEAM-5467
> URL: https://issues.apache.org/jira/browse/BEAM-5467
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Labels: portability-flink
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)