[jira] [Commented] (FLINK-13550) Support for CPU FlameGraphs in new web UI

2019-11-28 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984769#comment-16984769
 ] 

David Moravek commented on FLINK-13550:
---

Hi [~xintongsong], we already use this in our internal build and I'm definitely 
planning to contribute this back. It'd be helpful if you could assign the issue 
to me. I'll try to send an initial PR with the rest endpoint within next week.

> Support for CPU FlameGraphs in new web UI
> -
>
> Key: FLINK-13550
> URL: https://issues.apache.org/jira/browse/FLINK-13550
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: David Moravek
>Priority: Major
>
> For a better insight into a running job, it would be useful to have ability 
> to render a CPU flame graph for a particular job vertex.
> Flink already has a stack-trace sampling mechanism in-place, so it should be 
> straightforward to implement.
> This should be done by implementing a new endpoint in REST API, which would 
> sample the stack-trace the same way as current BackPressureTracker does, only 
> with a different sampling rate and length of sampling.
> [Here|https://www.youtube.com/watch?v=GUNDehj9z9o] is a little demo of the 
> feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14746) History server does not handle uncaught exceptions in archive fetcher.

2019-11-13 Thread David Moravek (Jira)
David Moravek created FLINK-14746:
-

 Summary: History server does not handle uncaught exceptions in 
archive fetcher.
 Key: FLINK-14746
 URL: https://issues.apache.org/jira/browse/FLINK-14746
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.9.1, 1.8.2
Reporter: David Moravek


In case archive fetcher fails with an error - eg. OOM while parsing json 
archives, the error is swallowed by ScheduledExectutorService and the submitted 
runnable is never rescheduled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14709) Allow outputting elements in close method of chained drivers.

2019-11-11 Thread David Moravek (Jira)
David Moravek created FLINK-14709:
-

 Summary: Allow outputting elements in close method of chained 
drivers.
 Key: FLINK-14709
 URL: https://issues.apache.org/jira/browse/FLINK-14709
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.9.1, 1.8.1, 1.7.2
Reporter: David Moravek


Currently, BatchTask and DataSourceTask only allow outputting elements in close 
method of "rich" operators, that they directly execute.

Task workflow is as follows:
1) open "head" driver  (calls "open" method on udf)
2) open chained drivers
3) run "head" driver
4) close "head" driver (calls "close" method on udf)
5) close output collector (no elements can be collected after this point)
6) close chained drivers

In order to properly support outputs from close method, we want to switch 6) 
and 5). We also need to tweak implementation of Reduce / Combine chained 
drivers, because they dispose sorters in closeTask method (this should be done 
in the close method).

This would bring huge performance improvement for Beam users, because we could 
properly implement bundling on batch (whole partition = single bundle).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14169) Cleanup expired jobs from history server

2019-09-23 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935722#comment-16935722
 ] 

David Moravek commented on FLINK-14169:
---

Can you please assign this to [~david.hrbacek]? We have this planned for the 
current sprint. Also, should this behavior be configurable, so it's consistent 
with prior versions?

Thanks,
D.

> Cleanup expired jobs from history server
> 
>
> Key: FLINK-14169
> URL: https://issues.apache.org/jira/browse/FLINK-14169
> Project: Flink
>  Issue Type: Improvement
>Reporter: David Moravek
>Priority: Minor
>
> Cleanup jobs, that are no longer in history refresh locations during 
> JobArchiveFetcher::run.
> https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java#L138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14169) Cleanup expired jobs from history server

2019-09-23 Thread David Moravek (Jira)
David Moravek created FLINK-14169:
-

 Summary: Cleanup expired jobs from history server
 Key: FLINK-14169
 URL: https://issues.apache.org/jira/browse/FLINK-14169
 Project: Flink
  Issue Type: Improvement
Reporter: David Moravek


Cleanup jobs, that are no longer in history refresh locations during 
JobArchiveFetcher::run.

https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java#L138



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-14166) Reuse cache from previous history server run

2019-09-23 Thread David Moravek (Jira)
David Moravek created FLINK-14166:
-

 Summary: Reuse cache from previous history server run
 Key: FLINK-14166
 URL: https://issues.apache.org/jira/browse/FLINK-14166
 Project: Flink
  Issue Type: Improvement
Reporter: David Moravek


Currently history server is not able to reuse cache from previous run, even 
when `historyserver.web.tmpdir` is set. It could simply "warm up" cached job 
ids set, from previously parsed jobs.

https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java#L129

This should be configurable, so it does not break backward compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13958) Job class loader may not be reused after batch job recovery

2019-09-05 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923264#comment-16923264
 ] 

David Moravek commented on FLINK-13958:
---

Hi Till, we're using attached mode on yarn. We'll try the detached mode instead 
and let you know. Anyway, do you think this is an issue worth fixing in 
general? If so, what would be the correct approach?

> Job class loader may not be reused after batch job recovery
> ---
>
> Key: FLINK-13958
> URL: https://issues.apache.org/jira/browse/FLINK-13958
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.9.0
>Reporter: David Moravek
>Priority: Major
>
> [https://lists.apache.org/thread.html/e241be9a1a10810a1203786dff3b7386d265fbe8702815a77bad42eb@%3Cdev.flink.apache.org%3E|http://example.com]
> 1) We have a per-job flink cluster
> 2) We use BATCH execution mode + region failover strategy
> Point 1) should imply single user code class loader per task manager (because 
> there is only single pipeline, that reuses class loader cached in 
> BlobLibraryCacheManager). We need this property, because we have UDFs that 
> access C libraries using JNI (I think this may be fairly common use-case when 
> dealing with legacy code). [JDK 
> internals|https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466]
>  make sure that single library can be only loaded by a single class loader 
> per JVM.
> When region recovery is triggered, vertices that need recover are first reset 
> back to CREATED stated and then rescheduled. In case all tasks in a task 
> manager are reset, this results in [cached class loader being 
> released|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338].
>  This unfortunately causes job failure, because we try to reload a native 
> library in a newly created class loader.
> I believe the correct approach would be not to release cached class loader if 
> the job is recovering, even though there are no tasks currently registered 
> with TM.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders

2019-09-04 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922555#comment-16922555
 ] 

David Moravek commented on FLINK-11402:
---

This is a [JVM 
limitation|https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466]
 and you have to make sure to load native libraries using system class loader 
in case you're submitting multiple jobs into the same cluster.

> User code can fail with an UnsatisfiedLinkError in the presence of multiple 
> classloaders
> 
>
> Key: FLINK-11402
> URL: https://issues.apache.org/jira/browse/FLINK-11402
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Task
>Affects Versions: 1.7.0
>Reporter: Ufuk Celebi
>Priority: Major
> Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz
>
>
> As reported on the user mailing list thread "[`env.java.opts` not persisting 
> after job canceled or failed and then 
> restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];,
>  there can be issues with using native libraries and user code class loading.
> h2. Steps to reproduce
> I was able to reproduce the issue reported on the mailing list using 
> [snappy-java|https://github.com/xerial/snappy-java] in a user program. 
> Running the attached user program works fine on initial submission, but 
> results in a failure when re-executed.
> I'm using Flink 1.7.0 using a standalone cluster started via 
> {{bin/start-cluster.sh}}.
> 0. Unpack attached Maven project and build using {{mvn clean package}} *or* 
> directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}}
> 1. Download 
> [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar]
>  and unpack libsnappyjava for your system:
> {code}
> jar tf snappy-java-1.1.7.2.jar | grep libsnappy
> ...
> org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so
> ...
> org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib
> ...
> {code}
> 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} 
> (path needs to be adjusted for your system):
> {code}
> env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64
> {code}
> 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}}
> {code}
> bin/flink run hello-snappy-1.0-SNAPSHOT.jar
> Starting execution of program
> Program execution finished
> Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished.
> Job Runtime: 359 ms
> {code}
> 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}}
> {code}
> bin/flink run hello-snappy-1.0-SNAPSHOT.jar
> Starting execution of program
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Job failed. 
> (JobID: 7d69baca58f33180cb9251449ddcd396)
>   at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
>   at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487)
>   at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>   at com.github.uce.HelloSnappy.main(HelloSnappy.java:18)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
>   at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
>   at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
>   at 
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
>   at 
> 

[jira] [Commented] (FLINK-13958) Job class loader may not be reused after batch job recovery

2019-09-04 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922554#comment-16922554
 ] 

David Moravek commented on FLINK-13958:
---

[~1u0] It's unrelated issue. The behavior you are describing is expected when 
you submit multiple jobs into the same cluster. Only option in that case is to 
workaround using system class loader (it's technically not a workaround).

> Job class loader may not be reused after batch job recovery
> ---
>
> Key: FLINK-13958
> URL: https://issues.apache.org/jira/browse/FLINK-13958
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.9.0
>Reporter: David Moravek
>Priority: Major
>
> [https://lists.apache.org/thread.html/e241be9a1a10810a1203786dff3b7386d265fbe8702815a77bad42eb@%3Cdev.flink.apache.org%3E|http://example.com]
> 1) We have a per-job flink cluster
> 2) We use BATCH execution mode + region failover strategy
> Point 1) should imply single user code class loader per task manager (because 
> there is only single pipeline, that reuses class loader cached in 
> BlobLibraryCacheManager). We need this property, because we have UDFs that 
> access C libraries using JNI (I think this may be fairly common use-case when 
> dealing with legacy code). [JDK 
> internals|https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466]
>  make sure that single library can be only loaded by a single class loader 
> per JVM.
> When region recovery is triggered, vertices that need recover are first reset 
> back to CREATED stated and then rescheduled. In case all tasks in a task 
> manager are reset, this results in [cached class loader being 
> released|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338].
>  This unfortunately causes job failure, because we try to reload a native 
> library in a newly created class loader.
> I believe the correct approach would be not to release cached class loader if 
> the job is recovering, even though there are no tasks currently registered 
> with TM.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (FLINK-13958) Job class loader may not be reused after batch job recovery

2019-09-04 Thread David Moravek (Jira)
David Moravek created FLINK-13958:
-

 Summary: Job class loader may not be reused after batch job 
recovery
 Key: FLINK-13958
 URL: https://issues.apache.org/jira/browse/FLINK-13958
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Affects Versions: 1.9.0
Reporter: David Moravek


[https://lists.apache.org/thread.html/e241be9a1a10810a1203786dff3b7386d265fbe8702815a77bad42eb@%3Cdev.flink.apache.org%3E|http://example.com]

1) We have a per-job flink cluster
2) We use BATCH execution mode + region failover strategy

Point 1) should imply single user code class loader per task manager (because 
there is only single pipeline, that reuses class loader cached in 
BlobLibraryCacheManager). We need this property, because we have UDFs that 
access C libraries using JNI (I think this may be fairly common use-case when 
dealing with legacy code). [JDK 
internals|https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L2466]
 make sure that single library can be only loaded by a single class loader per 
JVM.

When region recovery is triggered, vertices that need recover are first reset 
back to CREATED stated and then rescheduled. In case all tasks in a task 
manager are reset, this results in [cached class loader being 
released|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L338].
 This unfortunately causes job failure, because we try to reload a native 
library in a newly created class loader.

I believe the correct approach would be not to release cached class loader if 
the job is recovering, even though there are no tasks currently registered with 
TM.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13445) Distinguishing Memory Configuration for TaskManager and JobManager

2019-08-29 Thread David Moravek (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918732#comment-16918732
 ] 

David Moravek commented on FLINK-13445:
---

Hi [~StephanEwen], we're hitting the same issue on YARN and this should be 
fairly easy to fix. We're having large `containerized.heap-cutoff-min` for 
taskmanagers (lets say 2GB), because we need to allocated large chunks of 
non-flink managed memory off-heap. This doesn't allow us to allocate jobmanager 
with less than 2GBs, which is wasteful.

I'll send a PR that would introduce `jobmanager.containerized.heap-cutoff-min`, 
that would fallback to `containerized.heap-cutoff-min` and would allow you to 
override this option for jobmanager. WDYT?

> Distinguishing Memory Configuration for TaskManager and JobManager
> --
>
> Key: FLINK-13445
> URL: https://issues.apache.org/jira/browse/FLINK-13445
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Configuration
>Affects Versions: 1.8.1
>Reporter: madong
>Priority: Major
>
> we use flink to run some job  build in non-java language, so we increase the 
> value of `containerized.heap-cutoff-ratio` to reserve more memory for 
> non-java process , which would affect memory allocation for jobManager. 
> Considering the different behaviors of taskManager and jobManager, should we 
> use this configuration separately?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13550) Support for CPU FlameGraphs in new web UI

2019-08-05 Thread David Moravek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900143#comment-16900143
 ] 

David Moravek commented on FLINK-13550:
---

[~vthinkxie] Great! I'll ping you once I have the REST endpoint ready ;) Thanks

> Support for CPU FlameGraphs in new web UI
> -
>
> Key: FLINK-13550
> URL: https://issues.apache.org/jira/browse/FLINK-13550
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: David Moravek
>Priority: Major
>
> For a better insight into a running job, it would be useful to have ability 
> to render a CPU flame graph for a particular job vertex.
> Flink already has a stack-trace sampling mechanism in-place, so it should be 
> straightforward to implement.
> This should be done by implementing a new endpoint in REST API, which would 
> sample the stack-trace the same way as current BackPressureTracker does, only 
> with a different sampling rate and length of sampling.
> [Here|https://www.youtube.com/watch?v=GUNDehj9z9o] is a little demo of the 
> feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13550) Support for CPU FlameGraphs in new web UI

2019-08-02 Thread David Moravek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898701#comment-16898701
 ] 

David Moravek commented on FLINK-13550:
---

[Original 
thread|https://lists.apache.org/thread.html/8d0c0661c3587a485995f24f4a95841680535d9654534c207517e800@%3Cdev.flink.apache.org%3E].

> Support for CPU FlameGraphs in new web UI
> -
>
> Key: FLINK-13550
> URL: https://issues.apache.org/jira/browse/FLINK-13550
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: David Moravek
>Priority: Major
>
> For a better insight into a running job, it would be useful to have ability 
> to render a CPU flame graph for a particular job vertex.
> Flink already has a stack-trace sampling mechanism in-place, so it should be 
> straightforward to implement.
> This should be done by implementing a new endpoint in REST API, which would 
> sample the stack-trace the same way as current BackPressureTracker does, only 
> with a different sampling rate and length of sampling.
> [Here|https://www.youtube.com/watch?v=GUNDehj9z9o] is a little demo of the 
> feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13552) Render vertex FlameGraph in web UI

2019-08-02 Thread David Moravek (JIRA)
David Moravek created FLINK-13552:
-

 Summary: Render vertex FlameGraph in web UI
 Key: FLINK-13552
 URL: https://issues.apache.org/jira/browse/FLINK-13552
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Web Frontend
Reporter: David Moravek


Add a new FlameGraph tab in "vertex detail" page, that will actively poll flame 
graph endpoint and render it using d3 library.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13551) Add vertex FlameGraph REST endpoint

2019-08-02 Thread David Moravek (JIRA)
David Moravek created FLINK-13551:
-

 Summary: Add vertex FlameGraph REST endpoint
 Key: FLINK-13551
 URL: https://issues.apache.org/jira/browse/FLINK-13551
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / REST
Reporter: David Moravek


Add a new endpoint that returns data for flame graph rendering.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13550) Support for CPU FlameGraphs in new web UI

2019-08-02 Thread David Moravek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Moravek updated FLINK-13550:
--
Component/s: Runtime / REST

> Support for CPU FlameGraphs in new web UI
> -
>
> Key: FLINK-13550
> URL: https://issues.apache.org/jira/browse/FLINK-13550
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / REST, Runtime / Web Frontend
>Reporter: David Moravek
>Priority: Major
>
> For a better insight into a running job, it would be useful to have ability 
> to render a CPU flame graph for a particular job vertex.
> Flink already has a stack-trace sampling mechanism in-place, so it should be 
> straightforward to implement.
> This should be done by implementing a new endpoint in REST API, which would 
> sample the stack-trace the same way as current BackPressureTracker does, only 
> with a different sampling rate and length of sampling.
> [Here|https://www.youtube.com/watch?v=GUNDehj9z9o] is a little demo of the 
> feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13550) Support for CPU FlameGraphs in new web UI

2019-08-02 Thread David Moravek (JIRA)
David Moravek created FLINK-13550:
-

 Summary: Support for CPU FlameGraphs in new web UI
 Key: FLINK-13550
 URL: https://issues.apache.org/jira/browse/FLINK-13550
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Web Frontend
Reporter: David Moravek


For a better insight into a running job, it would be useful to have ability to 
render a CPU flame graph for a particular job vertex.

Flink already has a stack-trace sampling mechanism in-place, so it should be 
straightforward to implement.

This should be done by implementing a new endpoint in REST API, which would 
sample the stack-trace the same way as current BackPressureTracker does, only 
with a different sampling rate and length of sampling.

[Here|https://www.youtube.com/watch?v=GUNDehj9z9o] is a little demo of the 
feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13369) Recursive closure cleaner ends up with stackOverflow in case of circular dependency

2019-07-22 Thread David Moravek (JIRA)
David Moravek created FLINK-13369:
-

 Summary: Recursive closure cleaner ends up with stackOverflow in 
case of circular dependency
 Key: FLINK-13369
 URL: https://issues.apache.org/jira/browse/FLINK-13369
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.8.1, 1.9.0
Reporter: David Moravek






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13367) Make ClosureCleaner detect writeReplace serialization override

2019-07-22 Thread David Moravek (JIRA)
David Moravek created FLINK-13367:
-

 Summary: Make ClosureCleaner detect writeReplace serialization 
override
 Key: FLINK-13367
 URL: https://issues.apache.org/jira/browse/FLINK-13367
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.8.1, 1.9.0
Reporter: David Moravek


Nested ClosureCleaner introduced in FLINK-12297 does not respect [writeReplace 
serialization 
overrides.|https://docs.oracle.com/javase/8/docs/api/java/io/Serializable.html] 
This is a problem for Apache Beam, that takes advantage of this while 
serializing avro schemas.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13127) "--yarnship" doesn't support resource classloading

2019-07-08 Thread David Moravek (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880494#comment-16880494
 ] 

David Moravek commented on FLINK-13127:
---

Hello Yang, yes you can definitely do this. Problem is when you're using 
existing libraries that rely on classloading. Also yarnship behavior is now 
inconsistent, because classloading works for archives, but not for resources 
(*which are added to classpath too, but in incorrect format*).

 

I think proper behavior would be:
 * We traverse ship directory recursively
 ** If we find an archive, we add it to a archives list
 ** If we find a resource, we resolve its parent directory and add it to 
resourceDirectories set
 * We construct classpath for shipped files as follows: 
sorted(resourceDirectories) + sorted(archives)

 

I'll provide a patch for this today.

 

> "--yarnship" doesn't support resource classloading
> --
>
> Key: FLINK-13127
> URL: https://issues.apache.org/jira/browse/FLINK-13127
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.8.1
>Reporter: David Moravek
>Priority: Major
>
> Currently yarnship works as follows:
>  * user specifies directory to ship with the job
>  * yarn ships it with the container
>  * org.apache.flink.yarn.AbstractYarnClusterDescriptor#uploadAndRegisterFiles 
> traverses directory recursively and adds each file to the classpath
> This works well for shipping jars, but doesn't work correctly with shipping 
> resources that we want to load using java.lang.ClassLoader#getResource method.
> In order to make resource classloading work, we need to register it's 
> directory instead of the file itself (java classpath expects directories or 
> archives).
> CLASSPATH="shipped/custom.conf:${CLASSPATH}" needs to become 
> CLASSPATH="shipped:${CLASSPATH}"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-13127) "--yarnship" doesn't support resource classloading

2019-07-05 Thread David Moravek (JIRA)
David Moravek created FLINK-13127:
-

 Summary: "--yarnship" doesn't support resource classloading
 Key: FLINK-13127
 URL: https://issues.apache.org/jira/browse/FLINK-13127
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.8.1
Reporter: David Moravek


Currently yarnship works as follows:
 * user specifies directory to ship with the job
 * yarn ships it with the container
 * org.apache.flink.yarn.AbstractYarnClusterDescriptor#uploadAndRegisterFiles 
traverses directory recursively and adds each file to the classpath

This works well for shipping jars, but doesn't work correctly with shipping 
resources that we want to load using java.lang.ClassLoader#getResource method.

In order to make resource classloading work, we need to register it's directory 
instead of the file itself (java classpath expects directories or archives).

CLASSPATH="shipped/custom.conf:${CLASSPATH}" needs to become 
CLASSPATH="shipped:${CLASSPATH}"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-4232) Flink executable does not return correct pid

2016-07-19 Thread David Moravek (JIRA)
David Moravek created FLINK-4232:


 Summary: Flink executable does not return correct pid
 Key: FLINK-4232
 URL: https://issues.apache.org/jira/browse/FLINK-4232
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: David Moravek
Priority: Minor


Eg. when using supervisor, pid returned by ./bin/flink is pid of shell 
executable instead of java process



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)