[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-11 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082619#comment-16082619 ] Steven Zhen Wu commented on FLINK-7143: --- [~aljoscha] [~tzulitai] agree that {{partitionId %

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Created] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-7143: - Summary: Partition assignment for Kafka consumer is not stable Key: FLINK-7143 URL: https://issues.apache.org/jira/browse/FLINK-7143 Project: Flink Issue

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Updated] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-7143: -- Description: while deploying Flink 1.3 release to hundreds of routing jobs, we found some

[jira] [Comment Edited] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081420#comment-16081420 ] Steven Zhen Wu edited comment on FLINK-7143 at 7/11/17 4:51 AM: regarding

[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081665#comment-16081665 ] Steven Zhen Wu commented on FLINK-7143: --- I see. change is for support of multiple topics. hashCode()

[jira] [Commented] (FLINK-7143) Partition assignment for Kafka consumer is not stable

2017-07-10 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081420#comment-16081420 ] Steven Zhen Wu commented on FLINK-7143: --- regarding test coverage, we need a test that verify the

[jira] [Commented] (FLINK-6301) Flink KafkaConnector09 leaks memory on reading compressed messages due to a Kafka consumer bug

2017-07-21 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097121#comment-16097121 ] Steven Zhen Wu commented on FLINK-6301: --- [~vidhu5269] is this bug only related to gzip compression?

[jira] [Created] (FLINK-8042) retry individual failover-strategy for some time first before reverting to full job restart

2017-11-09 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-8042: - Summary: retry individual failover-strategy for some time first before reverting to full job restart Key: FLINK-8042 URL: https://issues.apache.org/jira/browse/FLINK-8042

[jira] [Created] (FLINK-8043) increment job restart metric when fine grained recovery reverted to full job restart

2017-11-09 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-8043: - Summary: increment job restart metric when fine grained recovery reverted to full job restart Key: FLINK-8043 URL: https://issues.apache.org/jira/browse/FLINK-8043

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Attachment: (was: image (1).png) > possible memory link in jobmanager retaining archived

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Description: First, some context about the job * Flink 1.4.1 * embarrassingly parallel: all

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Attachment: (was: image.png) > possible memory link in jobmanager retaining archived

[jira] [Created] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-9693: - Summary: possible memory link in jobmanager retaining archived checkpoints Key: FLINK-9693 URL: https://issues.apache.org/jira/browse/FLINK-9693 Project: Flink

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Description: First, some context about the job * Flink 1.4.1 * stand-alone deployment mode

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Attachment: ExecutionVertexZoomIn.png > possible memory link in jobmanager retaining archived

[jira] [Updated] (FLINK-9693) possible memory link in jobmanager retaining archived checkpoints

2018-06-29 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Attachment: 41K_ExecutionVertex_objs_retained_9GB.png > possible memory link in jobmanager

[jira] [Assigned] (FLINK-9061) add entropy to s3 path for better scalability

2018-05-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu reassigned FLINK-9061: - Assignee: Indrajit Roychoudhury Summary: add entropy to s3 path for better

[jira] [Updated] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2017-12-24 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-8043: -- Description: Fine grained recovery publish fullRestarts as guage, which is not suitable for

[jira] [Updated] (FLINK-8042) retry individual failover-strategy for some time first before reverting to full job restart

2018-01-18 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-8042: -- Priority: Blocker (was: Major) Fix Version/s: 1.4.1 > retry individual

[jira] [Commented] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-18 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331007#comment-16331007 ] Steven Zhen Wu commented on FLINK-8043: --- [~till.rohrmann]  one question: why is  _fullRestarts_ a

[jira] [Updated] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-18 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-8043: -- Priority: Blocker (was: Major) Fix Version/s: 1.4.1 > change fullRestarts (for fine

[jira] [Comment Edited] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-18 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331007#comment-16331007 ] Steven Zhen Wu edited comment on FLINK-8043 at 1/18/18 9:12 PM:

[jira] [Comment Edited] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-19 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331007#comment-16331007 ] Steven Zhen Wu edited comment on FLINK-8043 at 1/19/18 3:24 PM:

[jira] [Created] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-01-24 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-8506: - Summary: fullRestarts Gauge not incremented when jobmanager got killed Key: FLINK-8506 URL: https://issues.apache.org/jira/browse/FLINK-8506 Project: Flink

[jira] [Commented] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-01-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339560#comment-16339560 ] Steven Zhen Wu commented on FLINK-8506: --- Till, thanks for the explanation. Looks like we should

[jira] [Comment Edited] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-01-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16339560#comment-16339560 ] Steven Zhen Wu edited comment on FLINK-8506 at 1/25/18 6:45 PM: Till,

[jira] [Closed] (FLINK-8506) fullRestarts Gauge not incremented when jobmanager got killed

2018-02-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu closed FLINK-8506. - Resolution: Not A Problem > fullRestarts Gauge not incremented when jobmanager got killed >

[jira] [Updated] (FLINK-8042) Retry individual failover-strategy for some time first before reverting to full job restart

2018-02-12 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-8042: -- Description: Let's say we lost a taskmanager node. When Flink tries to attempt fine grained

[jira] [Commented] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310567#comment-16310567 ] Steven Zhen Wu commented on FLINK-8043: --- [~trohrm...@apache.org] any objection/comment on changing

[jira] [Comment Edited] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556325#comment-16556325 ] Steven Zhen Wu edited comment on FLINK-9693 at 7/26/18 5:15 AM: We can

[jira] [Comment Edited] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556325#comment-16556325 ] Steven Zhen Wu edited comment on FLINK-9693 at 7/27/18 12:57 AM: - We can

[jira] [Commented] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556325#comment-16556325 ] Steven Zhen Wu commented on FLINK-9693: --- One more observation. we are seeing this issue right after

[jira] [Comment Edited] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556325#comment-16556325 ] Steven Zhen Wu edited comment on FLINK-9693 at 7/25/18 9:56 PM: One more

[jira] [Comment Edited] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556325#comment-16556325 ] Steven Zhen Wu edited comment on FLINK-9693 at 7/25/18 11:23 PM: - We can

[jira] [Closed] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-08-11 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu closed FLINK-9693. - Resolution: Fixed close this jira since [~srichter] have addressed remaining issues in other

[jira] [Reopened] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu reopened FLINK-9693: --- Till, I cherry picked your fix for 1.4 branch. we are still seeing the memory leak issue. will

[jira] [Updated] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-25 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-9693: -- Attachment: 20180725_jm_mem_leak.png > Possible memory leak in jobmanager retaining archived

[jira] [Commented] (FLINK-9693) Possible memory leak in jobmanager retaining archived checkpoints

2018-07-09 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537514#comment-16537514 ] Steven Zhen Wu commented on FLINK-9693: --- [~till.rohrmann] is it possible to generate a patch for

[jira] [Commented] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-22 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334640#comment-16334640 ] Steven Zhen Wu commented on FLINK-8043: --- [~till.rohrmann] thx for the explanation. let me close this

[jira] [Closed] (FLINK-8043) change fullRestarts (for fine grained recovery) from guage to counter

2018-01-22 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu closed FLINK-8043. - Resolution: Invalid > change fullRestarts (for fine grained recovery) from guage to counter >

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426124#comment-16426124 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] Amazon doesn't want to reveal internal details,

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426124#comment-16426124 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/4/18 8:20 PM: --- [~jgrier] 

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-04 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426124#comment-16426124 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/4/18 8:24 PM: --- [~jgrier] 

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 4:31 PM:

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 4:27 PM:

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 4:27 PM:

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 4:27 PM:

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu commented on FLINK-9061: --- [~StephanEwen] [~jgrier] We run into S3 throttling issue

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 4:30 PM:

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-24 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412814#comment-16412814 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] Yes, we want to contribute this back. We can

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-23 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412326#comment-16412326 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/24/18 1:02 AM: Jamie,

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-23 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412326#comment-16412326 ] Steven Zhen Wu commented on FLINK-9061: --- Jamie, yes, we run into the same issue at Netflix. We did

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422843#comment-16422843 ] Steven Zhen Wu commented on FLINK-9061: --- Usually 4-char random prefix can go a long way. Even 2-char

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422843#comment-16422843 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/2/18 5:50 PM: --- Usually

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422976#comment-16422976 ] Steven Zhen Wu commented on FLINK-9061: --- I think S3 has more sophisticated pattern searching for

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422976#comment-16422976 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/2/18 9:22 PM: --- I think S3

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423080#comment-16423080 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/2/18 8:25 PM: --- reversing

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423080#comment-16423080 ] Steven Zhen Wu commented on FLINK-9061: --- reversing the components (split by slash char) doesn't give 

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423335#comment-16423335 ] Steven Zhen Wu commented on FLINK-9061: --- I don't know if it has to be "the very first characters".

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423335#comment-16423335 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 5:53 AM: --- I don't know

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 9:24 PM: --- [~jgrier]

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 9:22 PM: --- [~jgrier]

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] [~StephanEwen] Here are our thinking. if you

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424552#comment-16424552 ] Steven Zhen Wu commented on FLINK-9061: --- it seems that S3 walk through the prefix from left to right

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 9:23 PM: --- [~jgrier]

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 9:23 PM: --- [~jgrier]

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-03 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424625#comment-16424625 ] Steven Zhen Wu edited comment on FLINK-9061 at 4/3/18 9:43 PM: --- [~jgrier]

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 10:21 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 10:24 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 10:28 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 10:34 PM: -

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414696#comment-16414696 ] Steven Zhen Wu commented on FLINK-9061: --- It seems that our internal change *only* works with

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 11:38 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 11:39 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414696#comment-16414696 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 11:46 PM: - It seems

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-26 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414085#comment-16414085 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/26/18 11:49 PM: -

[jira] [Comment Edited] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-23 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412326#comment-16412326 ] Steven Zhen Wu edited comment on FLINK-9061 at 3/24/18 3:06 AM: Jamie,

[jira] [Created] (FLINK-10774) connection leak when partition discovery is disabled and open throws exception

2018-11-04 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-10774: -- Summary: connection leak when partition discovery is disabled and open throws exception Key: FLINK-10774 URL: https://issues.apache.org/jira/browse/FLINK-10774

[jira] [Commented] (FLINK-10774) connection leak when partition discovery is disabled and open throws exception

2018-11-04 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674471#comment-16674471 ] Steven Zhen Wu commented on FLINK-10774: [~srichter]  FYI. I will submit a patch soon >

[jira] [Commented] (FLINK-11196) Extend S3 EntropyInjector to use key replacement (instead of key removal) when creating checkpoint metadata files

2019-01-16 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744345#comment-16744345 ] Steven Zhen Wu commented on FLINK-11196: [~StephanEwen] can you take a look at the Jira and PR

[jira] [Commented] (FLINK-11195) Extend AbstractS3FileSystemFactory.createHadoopFileSystem to accept URI and Hadoop Configuration

2019-01-16 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-11195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744346#comment-16744346 ] Steven Zhen Wu commented on FLINK-11195: [~StephanEwen] can you take a look at this Jira and PR

[jira] [Created] (FLINK-10970) expose metric for total state size in terms of bytes

2018-11-21 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-10970: -- Summary: expose metric for total state size in terms of bytes Key: FLINK-10970 URL: https://issues.apache.org/jira/browse/FLINK-10970 Project: Flink

[jira] [Created] (FLINK-10969) expose API or metric for total number of keys stored in state backend

2018-11-21 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-10969: -- Summary: expose API or metric for total number of keys stored in state backend Key: FLINK-10969 URL: https://issues.apache.org/jira/browse/FLINK-10969 Project:

[jira] [Updated] (FLINK-10969) expose API or metric for total number of keys stored in state backend

2018-11-21 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-10969: --- Description: [~srichter] mentioned it might make sense to provide two versions: exact count

[jira] [Commented] (FLINK-7883) Make savepoints atomic with respect to state and side effects

2018-11-24 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-7883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697931#comment-16697931 ] Steven Zhen Wu commented on FLINK-7883: --- We would love to see this happening. it is the "graceful"

[jira] [Commented] (FLINK-10452) Expose Additional Metrics to Reason about Statesize

2018-11-22 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696156#comment-16696156 ] Steven Zhen Wu commented on FLINK-10452: I saw two tickets that I filed are marked as duplicate

[jira] [Created] (FLINK-10360) support timeout in savepoint REST api

2018-09-17 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-10360: -- Summary: support timeout in savepoint REST api Key: FLINK-10360 URL: https://issues.apache.org/jira/browse/FLINK-10360 Project: Flink Issue Type:

[jira] [Updated] (FLINK-10360) support timeout in savepoint REST api

2018-09-18 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-10360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-10360: --- Description: right now, savepoint share the same timeout config as checkpoint. With

[jira] [Updated] (FLINK-12781) run job REST api doesn't return complete stack trace for start job failure

2019-06-07 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-12781: --- Description: We use REST api to start a job in Flink cluster.

[jira] [Updated] (FLINK-12781) run job REST api doesn't return complete stack trace for start job failure

2019-06-07 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-12781: --- Description: We use REST api to start a job in Flink cluster.

[jira] [Created] (FLINK-12781) run job REST api doesn't return complete stack trace for start job failure

2019-06-07 Thread Steven Zhen Wu (JIRA)
Steven Zhen Wu created FLINK-12781: -- Summary: run job REST api doesn't return complete stack trace for start job failure Key: FLINK-12781 URL: https://issues.apache.org/jira/browse/FLINK-12781

[jira] [Updated] (FLINK-12781) run job REST api doesn't return complete stack trace for start job failure

2019-06-07 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-12781: --- Description: We use REST api to start a job in Flink cluster.

[jira] [Updated] (FLINK-12781) run job REST api doesn't return complete stack trace for start job failure

2019-06-07 Thread Steven Zhen Wu (JIRA)
[ https://issues.apache.org/jira/browse/FLINK-12781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Zhen Wu updated FLINK-12781: --- Description: We use REST api to start a job in Flink cluster.

  1   2   3   >