Re: Stateful Functions + ML model prediction

2020-10-07 Thread Tzu-Li (Gordon) Tai
Hi John, Thanks a lot for opening the JIRA ticket! If you are interested in contributing that to StateFun, I'm also happy to guide you with the contribution. On Mon, Oct 5, 2020 at 10:24 PM John Morrow wrote: > Thanks for the response Gordon, and that FlinkForward presentation - it's > been

Re: Statefun + Confluent Fully-managed Kafka

2020-10-07 Thread Tzu-Li (Gordon) Tai
Hi Hezekiah, I've confirmed that the Kafka properties set in the module specification file (module.yaml) are indeed correctly being parsed and used to construct the internal Kafka clients. StateFun / Flink does not alter or modify the properties. So, this should be something wrong with your

Re: Issue with Flink - unrelated executeSql causes other executeSqls to fail.

2020-10-07 Thread Dan Hill
Switching to junit4 did not help. If I make a request to the url returned from MiniClusterWithClientResource.flinkCluster.getClusterClient().getWebInterfaceURL(), I get {"errors":["Not found."]}. I'm not sure if this is intentional. On Tue, Oct 6, 2020 at 4:16 PM Dan Hill wrote: >

Network issue leading to "No pooled slot available"

2020-10-07 Thread Dan Diephouse
I am now using the S3 StreamingFileSink to send data to an S3 bucket. If/when the network connection has issues, it seems to put Flink into an irrecoverable state. Am I understanding this correctly? Any suggestions on how to troubleshoot / fix? Here is what I'm observing: *1. Network is dropped

NoResourceAvailableException

2020-10-07 Thread Alexander Semeshchenko
Installing (download & tar zxf) Apache Flink 1.11.1 and running: ./bin/flink run examples/streaming/WordCount.jar it show on the nice message after more less 5 min. the trying of submitting: Caused by:

Re: S3 StreamingFileSink issues

2020-10-07 Thread Dan Diephouse
FYI - I discovered that if I specify the Hadoop compression codec it works fine. E.g.: CompressWriters.forExtractor(new DefaultExtractor()).withHadoopCompression("GzipCodec") Haven't dug into exactly why yet. On Wed, Oct 7, 2020 at 12:14 PM David Anderson wrote: > Looping in @Kostas Kloudas

Re: S3 StreamingFileSink issues

2020-10-07 Thread David Anderson
Looping in @Kostas Kloudas who should be able to clarify things. David On Wed, Oct 7, 2020 at 7:12 PM Dan Diephouse wrote: > Thanks! Completely missed that in the docs. It's now working, however it's > not working with compression writers. Someone else noted this issue here: > > >

Re: Is it possible that late events are processed before the window?

2020-10-07 Thread Ori Popowski
Thanks On Wed, Oct 7, 2020 at 7:06 PM Till Rohrmann wrote: > Hi Ori, > > you are right. Events are being sent down the side output for late events > if the event's timestamp + the allowed lateness is smaller than the current > watermark. These events are directly seen by downstream operators

Re: The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread sidhant gupta
Hi Till, I understand the errors which appears in my logs are not stopping me from running the job. I am running flink session cluster in ECS and also configured graylog to get the container logs. So getting the docker logs is also not an issue. But is there a way to suppress this error or any

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-07 Thread Dylan Forciea
Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated. Dylan From: Dylan Forciea Date: Wednesday, October 7, 2020 at 11:47 AM To: Till Rohrmann , dev Cc: Shengkai Fang , "user@flink.apache.org" , "j...@apache.org" , Leonard Xu

Re: S3 StreamingFileSink issues

2020-10-07 Thread Dan Diephouse
Thanks! Completely missed that in the docs. It's now working, however it's not working with compression writers. Someone else noted this issue here: https://stackoverflow.com/questions/62138635/flink-streaming-compression-not-working-using-amazon-aws-s3-connector-streaming Looking at the code,

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-07 Thread Dylan Forciea
Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and

Re: Applying Custom metrics

2020-10-07 Thread Till Rohrmann
Hi Piper, the RichMapFunction's map function is called for every record you are processing in your DataStream/DataSet. The RichMapFunction is the definition of the map function you are applying to every record. Hence, it is basically what you pass to the DataStream.map(MapFunction myMapFunction)

Re: windowsState() and globalState()

2020-10-07 Thread Till Rohrmann
Hi Jiazhi, here is a description of Flink's windowing API and also how to use the windowState() and globalState() method of the ProcessWindowFunction [1]. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#using-per-window-state-in-processwindowfunction

Re: flink configuration: best practice for checkpoint storage secrets

2020-10-07 Thread Till Rohrmann
Hi Qinghui, the recommended way would be to use AWS identity and access management (IAM) [1] if possible. [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/s3.html#configure-access-credentials Cheers, Till On Wed, Oct 7, 2020 at 12:31 PM XU Qinghui wrote: > Hello,

Re: Flink Kuberntes Libraries

2020-10-07 Thread Till Rohrmann
HI Saksham, the easiest approach would probably be to include the required libraries in your user code jar which you submit to the cluster. Using maven's shade plugin should help with this task. Alternatively, you could also create a custom Flink Docker image where you add the required libraries

Re: Is it possible that late events are processed before the window?

2020-10-07 Thread Till Rohrmann
Hi Ori, you are right. Events are being sent down the side output for late events if the event's timestamp + the allowed lateness is smaller than the current watermark. These events are directly seen by downstream operators which consume the side output for late events. Cheers, Till On Wed, Oct

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-07 Thread Till Rohrmann
Hi Dylan, thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might

Re: The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread Till Rohrmann
Hi Sidhant, when using Flink's Docker image, then the cluster won't create the out files. Instead the components will directly write to STDOUT which is captured by Kubernetes and can be viewed using `kubectl logs POD_NAME`. The error which appears in your logs is not a problem. It is simply the

Re: Statefun + Confluent Fully-managed Kafka

2020-10-07 Thread Till Rohrmann
Hi Hezekiah, thanks for reporting this issue. I am pulling Gordon and Igal in who might be able to help you with this problem. Cheers, Till On Wed, Oct 7, 2020 at 3:56 PM hezekiah maina wrote: > Hi, > > I'm trying to use Stateful Functions with Kafka as my ingress and egress. > I'm using the

?????? The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread ??????
what's your running mode? if your flink cluster is on yarn mode,then the output you need has no relation to $FLINK_HOME/logs/*.out ---- ??:

Re: The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread sidhant gupta
Hi, I'm running flink cluster in ecs. There is a pipeline which creates the job manager and then the task manager using the docker image. Not sure if we would want to restart the cluster in production. Is there any way we can make sure the .out files will be created without restart ? I am able

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-07 Thread Dylan Forciea
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it. Thanks, Dylan From: Shengkai Fang Date: Wednesday, October 7, 2020 at 9:06 AM To: Dylan Forciea

?????? The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread ??????
it's easy, just restart your flink cluster(standalone mode) if you run flink in yarn mode,then the result will display on $HADOOP/logs/*.out files ---- ??:

Statefun + Confluent Fully-managed Kafka

2020-10-07 Thread hezekiah maina
Hi, I'm trying to use Stateful Functions with Kafka as my ingress and egress. I'm using the Confluent fully-managed Kafka and I'm having a challenge adding my authentication details in the module.yaml file. Here is my current config details: version: "1.0" module: meta: type: remote spec:

Re: autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-07 Thread Dylan Forciea
I hadn’t heard a response on this, so I’m going to expand this to the dev email list. If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR. Regards, Dylan Forciea Oseberg From:

Re: The file STDOUT does not exist on the TaskExecutor

2020-10-07 Thread sidhant gupta
++ user On Wed, Oct 7, 2020, 6:47 PM sidhant gupta wrote: > Hi > > I checked in the $FLINK_HOME/logs. The .out file was not there. Can you > suggest what should be the action item ? > > Thanks > Sidhant Gupta > > > On Wed, Oct 7, 2020, 7:17 AM 大森林 wrote: > >> >> check if the .out file is in

Re: Is it possible that late events are processed before the window?

2020-10-07 Thread Ori Popowski
After creating a toy example I think that I've got the concept of lateDataOutput wrong. It seems that the lateDataSideOutput has nothing to do with windowing; when events arrive late they'll just go straight to the side output, and there can never be any window firing of the main flow for that

Flink Kuberntes Libraries

2020-10-07 Thread saksham sapra
Hi , i have made some configuration using this link page : https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/kubernetes.html . and i am able to run flink on UI , but i need to submit a job using :

Re: Is it possible that late events are processed before the window?

2020-10-07 Thread Ori Popowski
I've made an experiment where I use an evictor on the main window (not the late one), only to write a debug file when the window fires (I don't actually evict events, I've made it so I can write a debug object the moment the window finishes). I can see that indeed the late data window fires

flink configuration: best practice for checkpoint storage secrets

2020-10-07 Thread XU Qinghui
Hello, folks We are trying to use S3 for the checkpoint storage, and this involves some secrets in the configuration. We tried two approaches to configure those secrets: - in the jvm application argument for jobmanager and taskmanager, such as -Ds3.secret-key - in the flink-conf.yaml file for

Re: S3 StreamingFileSink issues

2020-10-07 Thread David Anderson
Dan, The first point you've raised is a known issue: When a job is stopped, the unfinished part files are not transitioned to the finished state. This is mentioned in the docs as Important Note 2 [1], and fixing this is waiting on FLIP-46 [2]. That section of the docs also includes some

?????? why we need keyed state and operate state when we already have checkpoint?

2020-10-07 Thread ??????
Thanks for your replies,I have some understandings. There are two cases. 1. if I use no keyed state in program,when it's killed,I can only resume from previous result 1. if I use keyed state in program,when it's killed,I canresume from previous result and previous variable temporary

Re: Live updating Serialization Schemas in Flink

2020-10-07 Thread Dawid Wysakowicz
Hi, Unfortunately I don't have a nice solution for you. I would also generally discourage such a pattern. Usually how multiple/dynamic schemas are used is with a help of schema registry. In that case you have some sort of an id serialized along with records which you can use to look up the

Re: why we need keyed state and operate state when we already have checkpoint?

2020-10-07 Thread Arvid Heise
I think there is some misunderstanding here: a checkpoint IS (a snapshot of) the keyed state and operator state (among a few more things). [1] [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/learn-flink/fault_tolerance.html#definitions On Wed, Oct 7, 2020 at 6:51 AM 大森林 wrote: