[jira] [Commented] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure

2020-05-26 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116530#comment-17116530
 ] 

Richard Deurwaarder commented on FLINK-16572:
-

Could it be that this was run on an older code base? The stacktrace 
([https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2162=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5=6715)]
 refers to:
{code:java}
at 
org.apache.flink.streaming.connectors.gcp.pubsub.CheckPubSubEmulatorTest.testPull(CheckPubSubEmulatorTest.java:78)
{code}
That line used to have the assert but not anymore: 

[https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-connector-gcp-pubsub-emulator-tests/src/test/java/org/apache/flink/streaming/connectors/gcp/pubsub/CheckPubSubEmulatorTest.java#L78]

> CheckPubSubEmulatorTest is flaky on Azure
> -
>
> Key: FLINK-16572
> URL: https://issues.apache.org/jira/browse/FLINK-16572
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub, Tests
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Assignee: Richard Deurwaarder
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Log: 
> https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=ce095137-3e3b-5f73-4b79-c42d3d5f8283=7842



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure

2020-05-23 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114716#comment-17114716
 ] 

Richard Deurwaarder commented on FLINK-16572:
-

Apologies for the delay [~rmetzger], I had some struggles getting flink to 
built locally for some reason, which doesn't help in quickly adding some 
changes.. :P

 

I've added a retry in the test to see if after 60s it does find the missing 
message: [https://github.com/apache/flink/pull/12301]

Not something we should want in permanent testing code, but this should give 
some information to see if the publishing part or the pulling part is going 
wrong.

> CheckPubSubEmulatorTest is flaky on Azure
> -
>
> Key: FLINK-16572
> URL: https://issues.apache.org/jira/browse/FLINK-16572
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub, Tests
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Assignee: Richard Deurwaarder
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Log: 
> https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=ce095137-3e3b-5f73-4b79-c42d3d5f8283=7842



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure

2020-05-07 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101614#comment-17101614
 ] 

Richard Deurwaarder commented on FLINK-16572:
-

Yeah I'll have another look, reproducing these locally is quite difficult 
though, is it oke if I trigger the build on azure a couple of times with debug 
statements? I think I can trigger it by just doing dummy commits on a MR right?

> CheckPubSubEmulatorTest is flaky on Azure
> -
>
> Key: FLINK-16572
> URL: https://issues.apache.org/jira/browse/FLINK-16572
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Connectors / Google 
> Cloud PubSub
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Assignee: Richard Deurwaarder
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Log: 
> https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=ce095137-3e3b-5f73-4b79-c42d3d5f8283=7842



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure

2020-04-06 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076169#comment-17076169
 ] 

Richard Deurwaarder commented on FLINK-16572:
-

Sure, I'll have a look at it this week! (I cant assign it to myself but feel 
free to :))

> CheckPubSubEmulatorTest is flaky on Azure
> -
>
> Key: FLINK-16572
> URL: https://issues.apache.org/jira/browse/FLINK-16572
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Connectors / Google 
> Cloud PubSub
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.11.0
>
>
> Log: 
> https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=ce095137-3e3b-5f73-4b79-c42d3d5f8283=7842



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16142) Memory Leak causes Metaspace OOM error on repeated job submission

2020-03-04 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051013#comment-17051013
 ] 

Richard Deurwaarder commented on FLINK-16142:
-

[~sewen] you mention this:
{noformat}
I am wondering if this is how the JVM behaves? Class unloading has a slight 
delay and thus extremely fast cancel/resubmit cycles can lead to this?
If you increase metaspace by a bit, does it prevent this?"{noformat}
We've seen (and still see often) that exact same behavior on our jobs we mailed 
our findings about this a while back here: 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-constrains-running-Flink-on-Kubernetes-td28954.html]

Just mentioning it as a possible useful datapoint. 

> Memory Leak causes Metaspace OOM error on repeated job submission
> -
>
> Key: FLINK-16142
> URL: https://issues.apache.org/jira/browse/FLINK-16142
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.10.0
>Reporter: Thomas Wozniakowski
>Priority: Blocker
> Fix For: 1.10.1, 1.11.0
>
> Attachments: Leak-GC-root.png, java_pid1.hprof, java_pid1.hprof
>
>
> Hi Guys,
> We've just tried deploying 1.10.0 as it has lots of shiny stuff that fits our 
> use-case exactly (RocksDB state backend running in a containerised cluster). 
> Unfortunately, it seems like there is a memory leak somewhere in the job 
> submission logic. We are getting this error:
> {code:java}
> 2020-02-18 10:22:10,020 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph - OPERATOR_NAME 
> switched from RUNNING to FAILED.
> java.lang.OutOfMemoryError: Metaspace
> at java.lang.ClassLoader.defineClass1(Native Method)
> at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
> at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
> at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> at 
> org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:60)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.jmx.SdkMBeanRegistrySupport.registerMetricAdminMBean(SdkMBeanRegistrySupport.java:27)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:398)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.metrics.AwsSdkMetrics.(AwsSdkMetrics.java:359)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:728)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:660)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:652)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:611)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:606)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.executeListShards(AmazonKinesisClient.java:1534)
> at 
> org.apache.flink.kinesis.shaded.com.amazonaws.services.kinesis.AmazonKinesisClient.listShards(AmazonKinesisClient.java:1528)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.listShards(KinesisProxy.java:439)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getShardsOfStream(KinesisProxy.java:389)
> at 
> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getShardList(KinesisProxy.java:279)
> at 
> org.apache.flink.streaming.connectors.kinesis.internals.KinesisDataFetcher.discoverNewShardsToSubscribe(KinesisDataFetcher.java:686)
> at 
> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:287)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> {code}
> (The only change in the above text is the OPERATOR_NAME text where I removed 
> some of the internal specifics of our system).
> This will reliably happen on a fresh cluster after submitting and cancelling 
> our job 3 times.
> We are using the presto-s3 

[jira] [Commented] (FLINK-13230) Retry acknowledgement calls

2019-09-29 Thread Richard Deurwaarder (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940490#comment-16940490
 ] 

Richard Deurwaarder commented on FLINK-13230:
-

[~jark] would it instead be possible to merge this in?

> Retry acknowledgement calls
> ---
>
> Key: FLINK-13230
> URL: https://issues.apache.org/jira/browse/FLINK-13230
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Google Cloud PubSub
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.9.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a pubsub/grpc pull fails we retry based on configuration given 
> by the user.
> We should do the same for  Acknowledgement calls



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-13306) flink-examples-streaming-gcp-pubsub is missing NOTICE

2019-08-13 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905976#comment-16905976
 ] 

Richard Deurwaarder commented on FLINK-13306:
-

I had not seen this discussion before creating FLINK-13700.

Unless anyone changed their mind, shall I remove it altogether? The shaded 
example jar is actually quite big too (16mb) so that might be another reason 
for not including it in the final dist.

> flink-examples-streaming-gcp-pubsub is missing NOTICE
> -
>
> Key: FLINK-13306
> URL: https://issues.apache.org/jira/browse/FLINK-13306
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub, Examples
>Affects Versions: 1.9.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
>
> The pubsub example is bundling various dependencies but is missing a NOTICE 
> file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13700) PubSub connector example not included in flink-dist

2019-08-13 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-13700:
---

 Summary: PubSub connector example not included in flink-dist
 Key: FLINK-13700
 URL: https://issues.apache.org/jira/browse/FLINK-13700
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Google Cloud PubSub
Affects Versions: 1.9.0
Reporter: Richard Deurwaarder






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13083) Various improvements PubSub Connector

2019-08-08 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902969#comment-16902969
 ] 

Richard Deurwaarder commented on FLINK-13083:
-

I've just ran out integration tests and ran some dummy jobs against the real 
thing and it all works as intended!

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> Umbrella task to keep track of issues remaining when FLINK-9311 was merged.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (FLINK-13083) Various improvements PubSub Connector

2019-08-08 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902969#comment-16902969
 ] 

Richard Deurwaarder edited comment on FLINK-13083 at 8/8/19 1:22 PM:
-

I've just ran out integration tests and ran some dummy jobs against the real 
thing and it all works as intended!

 

Ps. The fancy new bot complains the Jira issues are not assigned, could you 
assign them both to me?


was (Author: xeli):
I've just ran out integration tests and ran some dummy jobs against the real 
thing and it all works as intended!

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> Umbrella task to keep track of issues remaining when FLINK-9311 was merged.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13083) Various improvements PubSub Connector

2019-08-07 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902456#comment-16902456
 ] 

Richard Deurwaarder commented on FLINK-13083:
-

Hi [~becket_qin], apologies for the late response. I had some misbehaving flink 
jobs at work taking up all my spare time :)

 

I've added 2 PRs to fix the remaining two Jira tickets. Tomorrow I'm gonna run 
the integration test suite we have for our own jobs with this code to make sure 
everything is fine, but other than that they should be good to go.

 

If you have time, could you have a look? 

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> Umbrella task to keep track of issues remaining when FLINK-9311 was merged.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-10428) AbstractServerBase should bind to any address per default

2019-07-18 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887843#comment-16887843
 ] 

Richard Deurwaarder commented on FLINK-10428:
-

[~yanghua] did you have a chance to look at this issue?

 

When running a flink cluster on kubernetes we need to jump through a loop of 
hoops to get queryable state working, it would be really convenient if the 
queryable state proxy server binds to any/all interfaces.

> AbstractServerBase should bind to any address per default
> -
>
> Key: FLINK-10428
> URL: https://issues.apache.org/jira/browse/FLINK-10428
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Queryable State
>Affects Versions: 1.5.4, 1.6.1, 1.7.0
>Reporter: Till Rohrmann
>Assignee: vinoyang
>Priority: Major
>
> The {{AbstractServerBase}}, the base class of the queryable state servers, is 
> setup to bind to the same hostname as the {{RpcService}} of the 
> {{TaskExecutor}}. People ran into the problem that in certain setups where 
> you start Flink components locally or in the same container, that the 
> queryable state server binds to 127.0.0.1. Therefore it is not possible to 
> connect from the outside to this machine.
> I propose to bind per default to any hostname (0.0.0.0) and make it 
> configurable in case that a user wants to bind to a specific hostname.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13231) Add a ratelimiter to pubsub source

2019-07-12 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883594#comment-16883594
 ] 

Richard Deurwaarder commented on FLINK-13231:
-

[~becket_qin] I was looking into this, the rate limiter only allows for a 
global rate, while the pubsub connector ideally would have a 'local' or 
per-subtask rate.

 

What's your opinion on how to cope with it? I'm thinking about extending the 
rate limiter to have a `setRatePerSubTask(int rate)` kind of method.

> Add a ratelimiter to pubsub source
> --
>
> Key: FLINK-13231
> URL: https://issues.apache.org/jira/browse/FLINK-13231
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Google Cloud PubSub
>Reporter: Richard Deurwaarder
>Priority: Minor
>
> Replace MaxMessagesToAcknowledge limit by introducing a rate limiter. See: 
> [https://github.com/apache/flink/pull/6594#discussion_r300215868]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (FLINK-13083) Various improvements PubSub Connector

2019-07-11 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883051#comment-16883051
 ] 

Richard Deurwaarder commented on FLINK-13083:
-

Hi [~becket_qin], I have not yet, except for the pubsub docs ticket. I've made 
subtask for the remaining issues and will assign them to myself when I'm going 
to start working on them

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> Umbrella task to keep track of issues remaining when FLINK-9311 was merged.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13231) Add a ratelimiter to pubsub source

2019-07-11 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-13231:
---

 Summary: Add a ratelimiter to pubsub source
 Key: FLINK-13231
 URL: https://issues.apache.org/jira/browse/FLINK-13231
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Google Cloud PubSub
Reporter: Richard Deurwaarder


Replace MaxMessagesToAcknowledge limit by introducing a rate limiter. See: 
[https://github.com/apache/flink/pull/6594#discussion_r300215868]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13083) Various improvements PubSub Connector

2019-07-11 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-13083:

Description: Umbrella task to keep track of issues remaining when 
FLINK-9311 was merged.  (was: * Add retry mechanism to Acknowledgement calls
 * Replace MaxMessagesToAcknowledge limit by introducing a rate limiter. See: 
https://github.com/apache/flink/pull/6594#discussion_r300215868)

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> Umbrella task to keep track of issues remaining when FLINK-9311 was merged.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (FLINK-13230) Retry acknowledgement calls

2019-07-11 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-13230:
---

 Summary: Retry acknowledgement calls
 Key: FLINK-13230
 URL: https://issues.apache.org/jira/browse/FLINK-13230
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Google Cloud PubSub
Reporter: Richard Deurwaarder


Currently when a pubsub/grpc pull fails we retry based on configuration given 
by the user.

We should do the same for  Acknowledgement calls



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (FLINK-13133) Correct error in PubSub documentation

2019-07-07 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-13133:

Component/s: Connectors / Google Cloud PubSub

> Correct error in PubSub documentation
> -
>
> Key: FLINK-13133
> URL: https://issues.apache.org/jira/browse/FLINK-13133
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
> Fix For: 1.9.0
>
>
> In the documentation for PubsubSink 
> ([https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/pubsub.html#pubsub-sink])
>  incorrectly uses *De*serialisationSchema, it should use a SerializationSchema



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-13133) Correct error in PubSub documentation

2019-07-07 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-13133:
---

 Summary: Correct error in PubSub documentation
 Key: FLINK-13133
 URL: https://issues.apache.org/jira/browse/FLINK-13133
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.9.0
Reporter: Richard Deurwaarder
Assignee: Richard Deurwaarder
 Fix For: 1.9.0


In the documentation for PubsubSink 
([https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/pubsub.html#pubsub-sink])
 incorrectly uses *De*serialisationSchema, it should use a SerializationSchema



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-13083) Various improvements PubSub Connector

2019-07-04 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-13083:

Description: 
* Add retry mechanism to Acknowledgement calls
 * Replace MaxMessagesToAcknowledge limit by introducing a rate limiter. See: 
https://github.com/apache/flink/pull/6594#discussion_r300215868

  was:* Add retry mechanism to Acknowledgement calls


> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> * Add retry mechanism to Acknowledgement calls
>  * Replace MaxMessagesToAcknowledge limit by introducing a rate limiter. See: 
> https://github.com/apache/flink/pull/6594#discussion_r300215868



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-13083) Various improvements PubSub Connector

2019-07-03 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder reassigned FLINK-13083:
---

Assignee: Richard Deurwaarder

> Various improvements PubSub Connector
> -
>
> Key: FLINK-13083
> URL: https://issues.apache.org/jira/browse/FLINK-13083
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: Richard Deurwaarder
>Assignee: Richard Deurwaarder
>Priority: Minor
>
> * Add retry mechanism to Acknowledgement calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-13083) Various improvements PubSub Connector

2019-07-03 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-13083:
---

 Summary: Various improvements PubSub Connector
 Key: FLINK-13083
 URL: https://issues.apache.org/jira/browse/FLINK-13083
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Google Cloud PubSub
Affects Versions: 1.9.0
Reporter: Richard Deurwaarder


* Add retry mechanism to Acknowledgement calls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12751) Create file based HA support

2019-06-27 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873907#comment-16873907
 ] 

Richard Deurwaarder commented on FLINK-12751:
-

I am not sure if kubelet controls this, nor if we would want to rely on it but: 
kubernetes allows us to mount PVCs with specific Access modes. So if the PVC is 
mounted as ReadWriteOnce this would prevent multiple job managers from being 
up, right?

[https://kubernetes.io/docs/concepts/storage/persistent-volumes/]

> Create file based HA support
> 
>
> Key: FLINK-12751
> URL: https://issues.apache.org/jira/browse/FLINK-12751
> Project: Flink
>  Issue Type: Improvement
>  Components: FileSystems
>Affects Versions: 1.8.0, 1.9.0, 2.0.0
> Environment: Flink on k8 and Mini cluster
>Reporter: Boris Lublinsky
>Priority: Major
>  Labels: features, pull-request-available
>   Original Estimate: 168h
>  Time Spent: 10m
>  Remaining Estimate: 167h 50m
>
> In the current Flink implementation, HA support can be implemented either 
> using Zookeeper or Custom Factory class.
> Add HA implementation based on PVC. The idea behind this implementation
> is as follows:
> * Because implementation assumes a single instance of Job manager (Job 
> manager selection and restarts are done by K8 Deployment of 1)
> URL management is done using StandaloneHaServices implementation (in the case 
> of cluster) and EmbeddedHaServices implementation (in the case of mini 
> cluster)
> * For management of the submitted Job Graphs, checkpoint counter and 
> completed checkpoint an implementation is leveraging the following file 
> system layout
> {code}
>  ha -> root of the HA data
>  checkpointcounter -> checkpoint counter folder
>   -> job id folder
>   -> counter file
>   -> another job id folder
>  ...
>  completedCheckpoint -> completed checkpoint folder
>   -> job id folder
>   -> checkpoint file
>   -> checkpoint file
>  ...
>   -> another job id folder
>  ...
>  submittedJobGraph -> submitted graph folder
>   -> job id folder
>   -> graph file
>   -> another job id folder
>  ...
> {code}
> An implementation should overwrites 2 of the Flink files:
> * HighAvailabilityServicesUtils - added `FILESYSTEM` option for picking HA 
> service
> * HighAvailabilityMode - added `FILESYSTEM` to available HA options.
> The actual implementation adds the following classes:
> * `FileSystemHAServices` - an implementation of a `HighAvailabilityServices` 
> for file system
> * `FileSystemUtils` - support class for creation of runtime components.
> * `FileSystemStorageHelper` - file system operations implementation for 
> filesystem based HA
> * `FileSystemCheckpointRecoveryFactory` - an implementation of a 
> `CheckpointRecoveryFactory`for file system
> * `FileSystemCheckpointIDCounter` - an implementation of a 
> `CheckpointIDCounter` for file system
> * `FileSystemCompletedCheckpointStore` - an implementation of a 
> `CompletedCheckpointStore` for file system
> * `FileSystemSubmittedJobGraphStore` - an implementation of a 
> `SubmittedJobGraphStore` for file system



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12376) GCS runtime exn: Request payload size exceeds the limit

2019-05-08 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835427#comment-16835427
 ] 

Richard Deurwaarder edited comment on FLINK-12376 at 5/8/19 8:37 AM:
-

[~StephanEwen] PubSub has no concept of ordering. For better of worse this is 
by design: [https://cloud.google.com/pubsub/docs/ordering] 

 

So what is happening is that grpc has limits that are server side and are not 
exposed to the clients.

I've actually added code to counter the exact exact issue [~haf] sees: 
[https://github.com/apache/flink/pull/6594/files#diff-ea875742509cef8c6f26e1b488447130R125]
 This code has been inspired/copied from the go pubsub client here: 
[https://code-review.googlesource.com/c/gocloud/+/9758/2/pubsub/service.go]

In short: Instead of acknowledging all id's at once, it tries to split it up in 
chunks of <500kb and does multiple requests. 

 

So one of two things might've happened:
 * Either [~haf] used a version of the connector that did not have this fix 
(could you confirm [~haf]?)
 * The way the connector splits up acknowledgement ids is off, adjusting for 
overhead isn't the easiest in java :(

I found this issue: [https://github.com/GoogleCloudPlatform/pubsub/pull/194] 
and they propose making ids per request configurable incase pubsub changes this 
limit on their side.

 

 

I could make the connector to split into smaller chunks and/or add it as a 
configuration option but it's quite a technical and hard to tune option.

Any thought on how to best approach this?


was (Author: xeli):
[~StephanEwen] PubSub has no concept of ordering. This is by design: 
[https://cloud.google.com/pubsub/docs/ordering] 

 

So what is happening is that grpc has limits that are server side and are not 
exposed to the clients.

I've actually added code to counter the exact exact issue [~haf] sees: 
[https://github.com/apache/flink/pull/6594/files#diff-ea875742509cef8c6f26e1b488447130R125]
 This code has been inspired/copied from the go pubsub client here: 
[https://code-review.googlesource.com/c/gocloud/+/9758/2/pubsub/service.go]

In short: Instead of acknowledging all id's at once, it tries to split it up in 
chunks of <500kb and does multiple requests. 

 

So one of two things might've happened:
 * Either [~haf] used a version of the connector that did not have this fix 
(could you confirm [~haf]?)
 * The way the connector splits up acknowledgement ids is off, adjusting for 
overhead isn't the easiest in java :(

I found this issue: [https://github.com/GoogleCloudPlatform/pubsub/pull/194] 
and they propose making ids per request configurable incase pubsub changes this 
limit on their side.

 

 

I could make the connector to split into smaller chunks and/or add it as a 
configuration option but it's quite a technical and hard to tune option.

Any thought on how to best approach this?

> GCS runtime exn: Request payload size exceeds the limit
> ---
>
> Key: FLINK-12376
> URL: https://issues.apache.org/jira/browse/FLINK-12376
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.7.2
> Environment: FROM flink:1.8.0-scala_2.11
> ARG version=0.17
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib
> COPY target/analytics-${version}.jar /opt/flink/lib/analytics.jar
>Reporter: Henrik
>Assignee: Richard Deurwaarder
>Priority: Major
> Attachments: Screenshot 2019-04-30 at 22.32.34.png
>
>
> I'm trying to use the google cloud storage file system, but it would seem 
> that the FLINK / GCS client libs are creating too-large requests far down in 
> the GCS Java client.
> The Java client is added to the lib folder with this command in Dockerfile 
> (probably 
> [hadoop2-1.9.16|https://search.maven.org/artifact/com.google.cloud.bigdataoss/gcs-connector/hadoop2-1.9.16/jar]
>  at the time of writing):
>  
> {code:java}
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib{code}
> This is the crash output. Focus lines:
> {code:java}
> java.lang.RuntimeException: Error while confirming checkpoint{code}
> and
> {code:java}
>  Caused by: com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.{code}
> Full stacktrace:
>  
> {code:java}
> [analytics-867c867ff6-l622h taskmanager] 2019-04-30 20:23:14,532 INFO  
> org.apache.flink.runtime.taskmanager.Task - Source: 
> Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/1) 
> (9a01e96c0271025d5ba73b735847cd4c) switched from RUNNING to FAILED.
> [analytics-867c867ff6-l622h taskmanager] java.lang.RuntimeException: Error 
> 

[jira] [Commented] (FLINK-12376) GCS runtime exn: Request payload size exceeds the limit

2019-05-08 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835427#comment-16835427
 ] 

Richard Deurwaarder commented on FLINK-12376:
-

[~StephanEwen] PubSub has no concept of ordering. This is by design: 
[https://cloud.google.com/pubsub/docs/ordering] 

 

So what is happening is that grpc has limits that are server side and are not 
exposed to the clients.

I've actually added code to counter the exact exact issue [~haf] sees: 
[https://github.com/apache/flink/pull/6594/files#diff-ea875742509cef8c6f26e1b488447130R125]
 This code has been inspired/copied from the go pubsub client here: 
[https://code-review.googlesource.com/c/gocloud/+/9758/2/pubsub/service.go]

In short: Instead of acknowledging all id's at once, it tries to split it up in 
chunks of <500kb and does multiple requests. 

 

So one of two things might've happened:
 * Either [~haf] used a version of the connector that did not have this fix 
(could you confirm [~haf]?)
 * The way the connector splits up acknowledgement ids is off, adjusting for 
overhead isn't the easiest in java :(

I found this issue: [https://github.com/GoogleCloudPlatform/pubsub/pull/194] 
and they propose making ids per request configurable incase pubsub changes this 
limit on their side.

 

 

I could make the connector to split into smaller chunks and/or add it as a 
configuration option but it's quite a technical and hard to tune option.

Any thought on how to best approach this?

> GCS runtime exn: Request payload size exceeds the limit
> ---
>
> Key: FLINK-12376
> URL: https://issues.apache.org/jira/browse/FLINK-12376
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.7.2
> Environment: FROM flink:1.8.0-scala_2.11
> ARG version=0.17
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib
> COPY target/analytics-${version}.jar /opt/flink/lib/analytics.jar
>Reporter: Henrik
>Priority: Major
> Attachments: Screenshot 2019-04-30 at 22.32.34.png
>
>
> I'm trying to use the google cloud storage file system, but it would seem 
> that the FLINK / GCS client libs are creating too-large requests far down in 
> the GCS Java client.
> The Java client is added to the lib folder with this command in Dockerfile 
> (probably 
> [hadoop2-1.9.16|https://search.maven.org/artifact/com.google.cloud.bigdataoss/gcs-connector/hadoop2-1.9.16/jar]
>  at the time of writing):
>  
> {code:java}
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib{code}
> This is the crash output. Focus lines:
> {code:java}
> java.lang.RuntimeException: Error while confirming checkpoint{code}
> and
> {code:java}
>  Caused by: com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.{code}
> Full stacktrace:
>  
> {code:java}
> [analytics-867c867ff6-l622h taskmanager] 2019-04-30 20:23:14,532 INFO  
> org.apache.flink.runtime.taskmanager.Task - Source: 
> Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/1) 
> (9a01e96c0271025d5ba73b735847cd4c) switched from RUNNING to FAILED.
> [analytics-867c867ff6-l622h taskmanager] java.lang.RuntimeException: Error 
> while confirming checkpoint
> [analytics-867c867ff6-l622h taskmanager]     at 
> org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1211)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.lang.Thread.run(Thread.java:748)
> [analytics-867c867ff6-l622h taskmanager] Caused by: 
> com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:49)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
> [analytics-867c867ff6-l622h taskmanager]     at 
> 

[jira] [Comment Edited] (FLINK-12376) GCS runtime exn: Request payload size exceeds the limit

2019-05-08 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835427#comment-16835427
 ] 

Richard Deurwaarder edited comment on FLINK-12376 at 5/8/19 8:37 AM:
-

[~StephanEwen] PubSub has no concept of ordering. For better or worse this is 
by design: [https://cloud.google.com/pubsub/docs/ordering] 

 

So what is happening is that grpc has limits that are server side and are not 
exposed to the clients.

I've actually added code to counter the exact exact issue [~haf] sees: 
[https://github.com/apache/flink/pull/6594/files#diff-ea875742509cef8c6f26e1b488447130R125]
 This code has been inspired/copied from the go pubsub client here: 
[https://code-review.googlesource.com/c/gocloud/+/9758/2/pubsub/service.go]

In short: Instead of acknowledging all id's at once, it tries to split it up in 
chunks of <500kb and does multiple requests. 

 

So one of two things might've happened:
 * Either [~haf] used a version of the connector that did not have this fix 
(could you confirm [~haf]?)
 * The way the connector splits up acknowledgement ids is off, adjusting for 
overhead isn't the easiest in java :(

I found this issue: [https://github.com/GoogleCloudPlatform/pubsub/pull/194] 
and they propose making ids per request configurable incase pubsub changes this 
limit on their side.

 

 

I could make the connector to split into smaller chunks and/or add it as a 
configuration option but it's quite a technical and hard to tune option.

Any thought on how to best approach this?


was (Author: xeli):
[~StephanEwen] PubSub has no concept of ordering. For better of worse this is 
by design: [https://cloud.google.com/pubsub/docs/ordering] 

 

So what is happening is that grpc has limits that are server side and are not 
exposed to the clients.

I've actually added code to counter the exact exact issue [~haf] sees: 
[https://github.com/apache/flink/pull/6594/files#diff-ea875742509cef8c6f26e1b488447130R125]
 This code has been inspired/copied from the go pubsub client here: 
[https://code-review.googlesource.com/c/gocloud/+/9758/2/pubsub/service.go]

In short: Instead of acknowledging all id's at once, it tries to split it up in 
chunks of <500kb and does multiple requests. 

 

So one of two things might've happened:
 * Either [~haf] used a version of the connector that did not have this fix 
(could you confirm [~haf]?)
 * The way the connector splits up acknowledgement ids is off, adjusting for 
overhead isn't the easiest in java :(

I found this issue: [https://github.com/GoogleCloudPlatform/pubsub/pull/194] 
and they propose making ids per request configurable incase pubsub changes this 
limit on their side.

 

 

I could make the connector to split into smaller chunks and/or add it as a 
configuration option but it's quite a technical and hard to tune option.

Any thought on how to best approach this?

> GCS runtime exn: Request payload size exceeds the limit
> ---
>
> Key: FLINK-12376
> URL: https://issues.apache.org/jira/browse/FLINK-12376
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.7.2
> Environment: FROM flink:1.8.0-scala_2.11
> ARG version=0.17
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib
> COPY target/analytics-${version}.jar /opt/flink/lib/analytics.jar
>Reporter: Henrik
>Assignee: Richard Deurwaarder
>Priority: Major
> Attachments: Screenshot 2019-04-30 at 22.32.34.png
>
>
> I'm trying to use the google cloud storage file system, but it would seem 
> that the FLINK / GCS client libs are creating too-large requests far down in 
> the GCS Java client.
> The Java client is added to the lib folder with this command in Dockerfile 
> (probably 
> [hadoop2-1.9.16|https://search.maven.org/artifact/com.google.cloud.bigdataoss/gcs-connector/hadoop2-1.9.16/jar]
>  at the time of writing):
>  
> {code:java}
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib{code}
> This is the crash output. Focus lines:
> {code:java}
> java.lang.RuntimeException: Error while confirming checkpoint{code}
> and
> {code:java}
>  Caused by: com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.{code}
> Full stacktrace:
>  
> {code:java}
> [analytics-867c867ff6-l622h taskmanager] 2019-04-30 20:23:14,532 INFO  
> org.apache.flink.runtime.taskmanager.Task - Source: 
> Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/1) 
> (9a01e96c0271025d5ba73b735847cd4c) switched from RUNNING to FAILED.
> [analytics-867c867ff6-l622h taskmanager] 

[jira] [Assigned] (FLINK-12376) GCS runtime exn: Request payload size exceeds the limit

2019-05-08 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder reassigned FLINK-12376:
---

Assignee: Richard Deurwaarder

> GCS runtime exn: Request payload size exceeds the limit
> ---
>
> Key: FLINK-12376
> URL: https://issues.apache.org/jira/browse/FLINK-12376
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Affects Versions: 1.7.2
> Environment: FROM flink:1.8.0-scala_2.11
> ARG version=0.17
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib
> COPY target/analytics-${version}.jar /opt/flink/lib/analytics.jar
>Reporter: Henrik
>Assignee: Richard Deurwaarder
>Priority: Major
> Attachments: Screenshot 2019-04-30 at 22.32.34.png
>
>
> I'm trying to use the google cloud storage file system, but it would seem 
> that the FLINK / GCS client libs are creating too-large requests far down in 
> the GCS Java client.
> The Java client is added to the lib folder with this command in Dockerfile 
> (probably 
> [hadoop2-1.9.16|https://search.maven.org/artifact/com.google.cloud.bigdataoss/gcs-connector/hadoop2-1.9.16/jar]
>  at the time of writing):
>  
> {code:java}
> ADD 
> https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
>  /opt/flink/lib{code}
> This is the crash output. Focus lines:
> {code:java}
> java.lang.RuntimeException: Error while confirming checkpoint{code}
> and
> {code:java}
>  Caused by: com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.{code}
> Full stacktrace:
>  
> {code:java}
> [analytics-867c867ff6-l622h taskmanager] 2019-04-30 20:23:14,532 INFO  
> org.apache.flink.runtime.taskmanager.Task - Source: 
> Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/1) 
> (9a01e96c0271025d5ba73b735847cd4c) switched from RUNNING to FAILED.
> [analytics-867c867ff6-l622h taskmanager] java.lang.RuntimeException: Error 
> while confirming checkpoint
> [analytics-867c867ff6-l622h taskmanager]     at 
> org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1211)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [analytics-867c867ff6-l622h taskmanager]     at 
> java.lang.Thread.run(Thread.java:748)
> [analytics-867c867ff6-l622h taskmanager] Caused by: 
> com.google.api.gax.rpc.InvalidArgumentException: 
> io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request payload size 
> exceeds the limit: 524288 bytes.
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:49)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
> [analytics-867c867ff6-l622h taskmanager]     at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
> [analytics-867c867ff6-l622h taskmanager]     at 
> io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:507)
> [analytics-867c867ff6-l622h taskmanager]     at 
> io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482)
> [analytics-867c867ff6-l622h taskmanager]     at 
> 

[jira] [Updated] (FLINK-12325) Statsd reporter gives wrong metrics when using negative numbers

2019-04-24 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-12325:

Description: 
The statsd reporter has a bug I believe when using negative numbers.

 

This is because when a metric is sent it is first converted to a string value. 
This means 100 becomes "100" and -100 becomes "-100". This value is then sent 
to statsd as a gauge value.

See this line for the conversion to string: 
[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]

And this line for sending it as a gauge:

[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]

 

This means a value of -100 will be sent like this:
{code:java}
:-100|g{code}
 

The statsd protocol how ever states the following 
([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
{noformat}
Adding a sign to the gauge value will change the value, rather than setting it.
{noformat}
 

 

So sending -100 multiple times means the gauge in statsd will be decremented 
multiple times, rather than set to -100.

 

I believe this isn't how flink expects it to work, is it?

  was:
The statsd reporter has a bug I believe when using negative numbers.

 

This is because when a metric is sent it is first converted to a string value. 
This means 100 becomes "100" and -100 becomes "-100". This value is then sent 
to statsd as a gauge value.

See this line for the conversion to string: 
[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]

And this line for sending it as a gauge:

[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]

 

This means a value of -100 will be sent like this:
{code:java}
:-100|g{code}
 

The statsd protocol how ever states the following 
([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
{noformat}
Adding a sign to the gauge value will change the value, rather than setting it.
{noformat}
 

 

So sending -100 multiple times means the gauge in statsd will be decremented 
multiple times, rather than set to -100.

 

I believe this isn't how flink expects it to work, isn't it?


> Statsd reporter gives wrong metrics when using negative numbers
> ---
>
> Key: FLINK-12325
> URL: https://issues.apache.org/jira/browse/FLINK-12325
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Richard Deurwaarder
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The statsd reporter has a bug I believe when using negative numbers.
>  
> This is because when a metric is sent it is first converted to a string 
> value. This means 100 becomes "100" and -100 becomes "-100". This value is 
> then sent to statsd as a gauge value.
> See this line for the conversion to string: 
> [https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]
> And this line for sending it as a gauge:
> [https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]
>  
> This means a value of -100 will be sent like this:
> {code:java}
> :-100|g{code}
>  
> The statsd protocol how ever states the following 
> ([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
> {noformat}
> Adding a sign to the gauge value will change the value, rather than setting 
> it.
> {noformat}
>  
>  
> So sending -100 multiple times means the gauge in statsd will be decremented 
> multiple times, rather than set to -100.
>  
> I believe this isn't how flink expects it to work, is it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12325) Statsd reporter gives wrong metrics when using negative numbers

2019-04-24 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-12325:
---

 Summary: Statsd reporter gives wrong metrics when using negative 
numbers
 Key: FLINK-12325
 URL: https://issues.apache.org/jira/browse/FLINK-12325
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics
Affects Versions: 1.8.0, 1.7.2, 1.6.4
Reporter: Richard Deurwaarder


The statsd reporter has a bug I believe when using negative numbers.

 

This is because when a metric is sent it is first converted to a string value. 
This means 100 becomes "100" and -100 becomes "-100". This value is then sent 
to statsd as a gauge value.

See this line for the conversion to string: 
[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]

And this line for sending it as a gauge:

[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]

 

So a value of -100 will be sent like this:
{code:java}
:-100|g{code}
 

The statsd protocol how ever states the following 
([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
{noformat}
Adding a sign to the gauge value will change the value, rather than setting it.
{noformat}
 

 

So sending -100 multiple times means the gauge in statsd will be decremented 
multiple times, rather than set to -100.

 

I believe this isn't how flink expects it to work, isn't it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12325) Statsd reporter gives wrong metrics when using negative numbers

2019-04-24 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-12325:

Description: 
The statsd reporter has a bug I believe when using negative numbers.

 

This is because when a metric is sent it is first converted to a string value. 
This means 100 becomes "100" and -100 becomes "-100". This value is then sent 
to statsd as a gauge value.

See this line for the conversion to string: 
[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]

And this line for sending it as a gauge:

[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]

 

This means a value of -100 will be sent like this:
{code:java}
:-100|g{code}
 

The statsd protocol how ever states the following 
([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
{noformat}
Adding a sign to the gauge value will change the value, rather than setting it.
{noformat}
 

 

So sending -100 multiple times means the gauge in statsd will be decremented 
multiple times, rather than set to -100.

 

I believe this isn't how flink expects it to work, isn't it?

  was:
The statsd reporter has a bug I believe when using negative numbers.

 

This is because when a metric is sent it is first converted to a string value. 
This means 100 becomes "100" and -100 becomes "-100". This value is then sent 
to statsd as a gauge value.

See this line for the conversion to string: 
[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]

And this line for sending it as a gauge:

[https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]

 

So a value of -100 will be sent like this:
{code:java}
:-100|g{code}
 

The statsd protocol how ever states the following 
([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
{noformat}
Adding a sign to the gauge value will change the value, rather than setting it.
{noformat}
 

 

So sending -100 multiple times means the gauge in statsd will be decremented 
multiple times, rather than set to -100.

 

I believe this isn't how flink expects it to work, isn't it?


> Statsd reporter gives wrong metrics when using negative numbers
> ---
>
> Key: FLINK-12325
> URL: https://issues.apache.org/jira/browse/FLINK-12325
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Metrics
>Affects Versions: 1.6.4, 1.7.2, 1.8.0
>Reporter: Richard Deurwaarder
>Priority: Minor
>
> The statsd reporter has a bug I believe when using negative numbers.
>  
> This is because when a metric is sent it is first converted to a string 
> value. This means 100 becomes "100" and -100 becomes "-100". This value is 
> then sent to statsd as a gauge value.
> See this line for the conversion to string: 
> [https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L130]
> And this line for sending it as a gauge:
> [https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-statsd/src/main/java/org/apache/flink/metrics/statsd/StatsDReporter.java#L184]
>  
> This means a value of -100 will be sent like this:
> {code:java}
> :-100|g{code}
>  
> The statsd protocol how ever states the following 
> ([https://github.com/statsd/statsd/blob/master/docs/metric_types.md#gauges]):
> {noformat}
> Adding a sign to the gauge value will change the value, rather than setting 
> it.
> {noformat}
>  
>  
> So sending -100 multiple times means the gauge in statsd will be decremented 
> multiple times, rather than set to -100.
>  
> I believe this isn't how flink expects it to work, isn't it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12075) Not able to submit jobs on YARN when there's a firewall

2019-04-02 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807578#comment-16807578
 ] 

Richard Deurwaarder commented on FLINK-12075:
-

I've just tested the PR of [~till.rohrmann] on our cluster and this works 
perfectly for us.

 

Thank you [~aljoscha] and [~till.rohrmann] for fixing and giving this priority 
for 1.8.0!

> Not able to submit jobs on YARN when there's a firewall
> ---
>
> Key: FLINK-12075
> URL: https://issues.apache.org/jira/browse/FLINK-12075
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Richard Deurwaarder
>Assignee: Till Rohrmann
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.8.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If there is a firewall around the YARN cluster and the machine, submitting 
> flink job it is unpractical because new flink clusters start up with random 
> ports for REST communication.
>  
> FLINK-5758 should've fixed this. But it seems FLINK-11081 either undid the 
> changes or did not implement this. The relevant code is changed in 
> FLINK-11081 
> ([https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102])
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12075) Not able to submit jobs on YARN when there's a firewall

2019-04-01 Thread Richard Deurwaarder (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Deurwaarder updated FLINK-12075:

Description: 
If there is a firewall around the YARN cluster and the machine, submitting 
flink job it is unpractical because new flink clusters start up with random 
ports for REST communication.

 

FLINK-5758 should've fixed this. But it seems FLINK-11081 either undid the 
changes or did not implement this. The relevant code is changed in FLINK-11081 
([https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102])

 

 

  was:If there is a firewall around the YARN cluster and the machine, 
submitting flink job it is unpractical because new flink clusters start up with 
random ports for REST communication.


> Not able to submit jobs on YARN when there's a firewall
> ---
>
> Key: FLINK-12075
> URL: https://issues.apache.org/jira/browse/FLINK-12075
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.7.2, 1.8.0
>Reporter: Richard Deurwaarder
>Priority: Major
>
> If there is a firewall around the YARN cluster and the machine, submitting 
> flink job it is unpractical because new flink clusters start up with random 
> ports for REST communication.
>  
> FLINK-5758 should've fixed this. But it seems FLINK-11081 either undid the 
> changes or did not implement this. The relevant code is changed in 
> FLINK-11081 
> ([https://github.com/apache/flink/commit/730eed71ef3f718d61f85d5e94b1060844ca56db#diff-487838863ab693af7008f04cb3359be3R102])
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12075) Not able to submit jobs on YARN when there's a firewall

2019-04-01 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-12075:
---

 Summary: Not able to submit jobs on YARN when there's a firewall
 Key: FLINK-12075
 URL: https://issues.apache.org/jira/browse/FLINK-12075
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.7.2, 1.8.0
Reporter: Richard Deurwaarder


If there is a firewall around the YARN cluster and the machine, submitting 
flink job it is unpractical because new flink clusters start up with random 
ports for REST communication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10419) ClassNotFoundException while deserializing user exceptions from checkpointing

2018-09-28 Thread Richard Deurwaarder (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632161#comment-16632161
 ] 

Richard Deurwaarder commented on FLINK-10419:
-

I have actually run into a very similar exception while working on FLINK-9311
{code:java}
java.lang.NoClassDefFoundError: 
io/grpc/netty/shaded/io/grpc/netty/NettyClientTransport$6
at 
io.grpc.netty.shaded.io.grpc.netty.NettyClientTransport.shutdownNow(NettyClientTransport.java:287)
at 
io.grpc.internal.KeepAliveManager$ClientKeepAlivePinger.onPingTimeout(KeepAliveManager.java:282)
at io.grpc.internal.KeepAliveManager$1.run(KeepAliveManager.java:58)
at io.grpc.internal.LogExceptionRunnable.run(LogExceptionRunnable.java:41)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:125)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at 
io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:465)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at 
io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
io.grpc.netty.shaded.io.grpc.netty.NettyClientTransport$6
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$ChildFirstClassLoader.loadClass(FlinkUserCodeClassLoaders.java:129)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 12 more{code}
How did you figure out the initial error?

> ClassNotFoundException while deserializing user exceptions from checkpointing
> -
>
> Key: FLINK-10419
> URL: https://issues.apache.org/jira/browse/FLINK-10419
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.6.0, 1.6.1, 1.7.0, 1.5.4
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
>
> It seems that somewhere in the operator's failure handling, we hand a 
> user-code exception to the checkpoint coordinator via Java serialization but 
> it will then fail during the de-serialization because the class is not 
> available. This will result in the following error shadowing the real one:
> {code}
> java.lang.ClassNotFoundException: 
> org.apache.flink.streaming.connectors.kafka.FlinkKafka011Exception
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher.loadClass(Launcher.java:338)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at 
> org.apache.flink.util.InstantiationUtil.resolveClass(InstantiationUtil.java:76)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1859)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1745)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2033)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
> at 
> java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:557)
> at java.lang.Throwable.readObject(Throwable.java:914)
> at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
> at 
> org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation.readObject(RemoteRpcInvocation.java:222)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> 

[jira] [Comment Edited] (FLINK-9311) PubSub connector

2018-05-07 Thread Richard Deurwaarder (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466409#comment-16466409
 ] 

Richard Deurwaarder edited comment on FLINK-9311 at 5/7/18 8:06 PM:


PubSub works more like RabbitMQ than Kafka using ack/nack. It has 
[atleast-once-delivery|https://cloud.google.com/pubsub/docs/subscriber], but 
using a [message 
id|http://googleapis.github.io/googleapis/java/all/latest/apidocs/com/google/pubsub/v1/PubsubMessageOrBuilder.html#getMessageId--]
 exactly-once can be achived.

Is there an existing connector using message id's for deduplication?


was (Author: xeli):
PubSub works more like RabbitMQ than Kafka using ack/nack. It has 
atleast-once-delivery, but using a [message 
id|http://googleapis.github.io/googleapis/java/all/latest/apidocs/com/google/pubsub/v1/PubsubMessageOrBuilder.html#getMessageId--]
 exactly-once can be achived.

Is there an existing connector using message id's for deduplication?

> PubSub connector
> 
>
> Key: FLINK-9311
> URL: https://issues.apache.org/jira/browse/FLINK-9311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Richard Deurwaarder
>Priority: Minor
>
> I would like start adding some google cloud connectors starting with a PubSub 
> Source. I have a basic implementation ready but I want it to be able to:
>  * easily scale up (should I have it extend RichParallelSourceFunction for 
> this?)
>  * Make it easier to provide the google cloud credentials. This would require 
> being able to send some json string / ServiceAccount to the nodes when 
> starting up this source.
> Could this be something that would be useful for others and added to the 
> flink connectors repo?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9311) PubSub connector

2018-05-07 Thread Richard Deurwaarder (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466409#comment-16466409
 ] 

Richard Deurwaarder commented on FLINK-9311:


PubSub works more like RabbitMQ than Kafka using ack/nack. It has 
atleast-once-delivery, but using a [message 
id|http://googleapis.github.io/googleapis/java/all/latest/apidocs/com/google/pubsub/v1/PubsubMessageOrBuilder.html#getMessageId--]
 exactly-once can be achived.

Is there an existing connector using message id's for deduplication?

> PubSub connector
> 
>
> Key: FLINK-9311
> URL: https://issues.apache.org/jira/browse/FLINK-9311
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Richard Deurwaarder
>Priority: Minor
>
> I would like start adding some google cloud connectors starting with a PubSub 
> Source. I have a basic implementation ready but I want it to be able to:
>  * easily scale up (should I have it extend RichParallelSourceFunction for 
> this?)
>  * Make it easier to provide the google cloud credentials. This would require 
> being able to send some json string / ServiceAccount to the nodes when 
> starting up this source.
> Could this be something that would be useful for others and added to the 
> flink connectors repo?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9311) PubSub connector

2018-05-07 Thread Richard Deurwaarder (JIRA)
Richard Deurwaarder created FLINK-9311:
--

 Summary: PubSub connector
 Key: FLINK-9311
 URL: https://issues.apache.org/jira/browse/FLINK-9311
 Project: Flink
  Issue Type: New Feature
  Components: Streaming Connectors
Reporter: Richard Deurwaarder


I would like start adding some google cloud connectors starting with a PubSub 
Source. I have a basic implementation ready but I want it to be able to:
 * easily scale up (should I have it extend RichParallelSourceFunction for 
this?)
 * Make it easier to provide the google cloud credentials. This would require 
being able to send some json string / ServiceAccount to the nodes when starting 
up this source.

Could this be something that would be useful for others and added to the flink 
connectors repo?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)