Re: Hadoop_Compatability
Thanks for the response Chesnay, i will try to understand it. If I have doubts I will get back. Thanks & Regards, Dinesh. On Thu, Aug 6, 2020 at 5:11 PM Chesnay Schepler wrote: > We still offer a flink-shaded-hadoop-2 artifact that you can find on the > download page: > https://flink.apache.org/downloads.html#additional-components > In 1.9 we changed the artifact name. > > Note that we will not release newer versions of this dependency. > > As for providing Hadoop class, there is some guidance in the documentation > <https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html#providing-hadoop-classes> > . > If this does not work fr your, and you want to put the required > dependencies into the lib/ directory, then you will have to look through > your dependency tree to determine which Hadoop dependencies are required. > The Flink hadoop-compatibility dependency requires hadoop-common and > hadoop-mapreduce-client-core, and whatever transitive dependencies these > have. > > On 06/08/2020 13:08, C DINESH wrote: > > Hi All, > > From 1.9 version there is no *flink-shaded-hadoop2 dependency. To use > Hadoop APIS like *IntWritable , LongWritable. What are the dependencies > we need to add to use these APIs. > > I tried searching in google. Not able to understand the solution. Please > guide me. > > Thanks in Advance. > Dinesh. > > >
Hadoop_Compatability
Hi All, >From 1.9 version there is no *flink-shaded-hadoop2 dependency. To use Hadoop APIS like *IntWritable , LongWritable. What are the dependencies we need to add to use these APIs. I tried searching in google. Not able to understand the solution. Please guide me. Thanks in Advance. Dinesh.
DashBoard Name
Hi All, In flink UI the name of the dashboard is Apache Flink dash Board. we have different environments. If I want to change the name of the dash board. Where do i need to change it? Thanks & Regards, Dinesh.
ElasticSearch_Sink
Hello All, Can we implement 2 Phase Commit Protocol for elastic search sink. Will there be any limitations? Thanks in advance. Warm regards, Dinesh.
Mongodb_sink
Hello all, Can we implement TwoPhaseCommitProtocol for mongodb to get EXACTLY_ONCE semantics. Will there be any limitation for it? Thanks, Dinesh.
Re: Dynamic source and sink.
Hi paul, Thanks for the response. Can you point out an example of how to create a dynamic client or wrapper operator. Thanks and Regards, Dinesh. On Thu, Jul 2, 2020 at 12:28 PM Paul Lam wrote: > Hi Doinesh, > > I think the problem you meet is quite common. > > But with the current Flink architecture, operators must be determined at > compile time (when you submit your job). This is by design IIUC. > Suppose the operators are changeable, then Flink would need to go through > the compile-optimize-schedule phases once a new operator > is added. That would be little difference with restarting a job. > > I see two alternative solutions, FYI: > > 1. Implement a custom sink function as Danny suggested. The sink function > dynamically creates a new client for the respective ES cluster > on receiving a new tenant configuration. > 2. Still restart the job, and optimize the downtime by using session mode. > > Best, > Paul Lam > > 2020年7月2日 11:23,C DINESH 写道: > > Hi Danny, > > Thanks for the response. > > In short without restarting we cannot add new sinks or sources. > > For better understanding I will explain my problem more clearly. > > My scenario is I have two topics, one is configuration topic and second > one is event activities. > > * In the configuration topic I will get details of the kafka cluster > details and elasticsearch cluster details. > * In the event activities i will get events and each event will have a > tenantId > * Suppose now we are getting a new tenantId data I need to send data to > the respective elastic search cluster which I will come to know in the > runtime from the configuration topic. > * Is there a way to add a new elastic search sink in the same job without > restarting. > > Before starting a job I can create two elastic search sinks and with a > condition I can route the data to the respective elastic search cluster. Is > there a way to do it in runtime? > > > Thanks and Regards, > Doinesh. > > > On Wed, Jul 1, 2020 at 5:24 PM Danny Chan wrote: > >> Sorry, a job graph is solid while we compile it before submitting to the >> cluster, not dynamic as what you want. >> >> You did can write some wrapper operators which response to your own PRCs >> to run the appended operators you want, >> But the you should keep the consistency semantics by yourself. >> >> Best, >> Danny Chan >> 在 2020年6月28日 +0800 PM3:30,C DINESH ,写道: >> >> Hi All, >> >> In a flink job I have a pipeline. It is consuming data from one kafka >> topic and storing data to Elastic search cluster. >> >> without restarting the job can we add another kafka cluster and another >> elastic search sink to the job. Which means i will supply the new kafka >> cluster and elastic search details in the topic. After consuming the data >> can our flink job add the new source and sink to the same job. >> >> >> Thanks & Regards, >> Dinesh. >> >> >
Re: Dynamic source and sink.
Hi Danny, Thanks for the response. In short without restarting we cannot add new sinks or sources. For better understanding I will explain my problem more clearly. My scenario is I have two topics, one is configuration topic and second one is event activities. * In the configuration topic I will get details of the kafka cluster details and elasticsearch cluster details. * In the event activities i will get events and each event will have a tenantId * Suppose now we are getting a new tenantId data I need to send data to the respective elastic search cluster which I will come to know in the runtime from the configuration topic. * Is there a way to add a new elastic search sink in the same job without restarting. Before starting a job I can create two elastic search sinks and with a condition I can route the data to the respective elastic search cluster. Is there a way to do it in runtime? Thanks and Regards, Doinesh. On Wed, Jul 1, 2020 at 5:24 PM Danny Chan wrote: > Sorry, a job graph is solid while we compile it before submitting to the > cluster, not dynamic as what you want. > > You did can write some wrapper operators which response to your own PRCs > to run the appended operators you want, > But the you should keep the consistency semantics by yourself. > > Best, > Danny Chan > 在 2020年6月28日 +0800 PM3:30,C DINESH ,写道: > > Hi All, > > In a flink job I have a pipeline. It is consuming data from one kafka > topic and storing data to Elastic search cluster. > > without restarting the job can we add another kafka cluster and another > elastic search sink to the job. Which means i will supply the new kafka > cluster and elastic search details in the topic. After consuming the data > can our flink job add the new source and sink to the same job. > > > Thanks & Regards, > Dinesh. > >
Dynamic source and sink.
Hi All, In a flink job I have a pipeline. It is consuming data from one kafka topic and storing data to Elastic search cluster. without restarting the job can we add another kafka cluster and another elastic search sink to the job. Which means i will supply the new kafka cluster and elastic search details in the topic. After consuming the data can our flink job add the new source and sink to the same job. Thanks & Regards, Dinesh.
Re: Stateful-fun-Basic-Hello
Hi Team, I mean to say that know I understood. but in the documentation page flink-conf.yaml is not mentioned On Mon, May 25, 2020 at 7:18 PM C DINESH wrote: > Thanks Gordon, > > I read the documentation several times. But I didn't understand at that > time, flink-conf.yaml is not there. > > can you please suggest > 1. how to increase parallelism > 2. how to give checkpoints to the job > > As far as I know there is no documentation regarding this. or Are these > features are not there yet? > > Cheers, > Dinesh. >
Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)
HI Jary, My first point is wrong. If we give these settings also flink will consume the whole data from the last one day.That is what we want right? late data is defined by window length and water marks strategy. Are you combining your streams .please provide these details so that we can understand the problem more clearly. Thanks, Dinesh. On Mon, May 25, 2020 at 6:58 PM C DINESH wrote: > HI Jary, > > The easiest and simple solution is while creating consumer you can pass > different config based on your requirements > > Example : > > For creating consumer for topic A you can pass config as > max.poll.records: “1" > max.poll.interval.ms: "1000” > > For creating consumer for topic B you can pass config as > max.poll.records: “3600" > max.poll.interval.ms: "1000” > > > 1. But actually your configuration has a flaw when you are giving > setStartFromTimestamp. Which means if topic B is generating 3600 events > for every second and you put the setStartFromTimestamp to consume data from > last 24 hours . Your second consumer will always be lag of one day.(It will > never consume the real time data). Which is not we want in streaming. > > 2. For flink we don't need to pass these settings (max.poll.records, > max.poll.interval.ms). Flink will consume the data realtime by the > architecture. If your job is consuming data slowly means(back pressure) you > have to increase parallelism. there are several ways to increase > parallelism (operator level, job level). > > > I hope, I explained it clearly. please let me know if you need further > clarifications. > > Thanks, > Dinesh > > > On Mon, May 25, 2020 at 12:34 PM Jary Zhen wrote: > >> Hi, dinesh , thanks for your reply. >> >> For example, there are two topics, topic A produces 1 record per second >> and topic B produces 3600 records per second. If I set kafka consume >> config like this: >> max.poll.records: “3600" >> max.poll.interval.ms: "1000”) , >> which means I can get the whole records by every second from these two >> topics in real time. >> But , if I want to consume the data from last day or earlier days by >> using FlinkKafkaConsumer.setStartFromTimestamp(timestamp). I will get 3600 >> records within one second from *topic A* which is produce *in an hour* in >> production environment, at the same time, I will get 3600 records within >> one second from* topic B* which is produce *in an second. *So By using >> *EventTime* semanteme , the watermark assigned from topic A >> wil aways let >> the data from topic B as ‘late data’ in window operator. What I wanted >> is that 1 records from A and 3600 records from B by using >> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) so that I can >> simulate consume data as in real production environment. >> >> >> Best >> >> >> >> >> >> >> >> On Sat, 23 May 2020 at 23:42, C DINESH wrote: >> >>> Hi Jary, >>> >>> What you mean by step banlence . Could you please provide a concrete >>> example >>> >>> On Fri, May 22, 2020 at 3:46 PM Jary Zhen wrote: >>> >>>> Hello everyone, >>>> >>>>First,a brief pipeline introduction: >>>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) >>>> consume multi kafka topic >>>> -> union them >>>> -> assignTimestampsAndWatermarks >>>> -> keyby >>>> -> window() and so on … >>>> It's a very normal way use flink to process data like this in >>>> production environment. >>>> But, If I want to test the pipeline above I need to use the api of >>>> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data. >>>> So my question is how to control the ’step‘ banlence as one topic >>>> produces 3 records per second while another topic produces 3 per >>>> second. >>>> >>>> I don’t know if I describe clearly . so any suspicion please let me know >>>> >>>> Tks >>>> >>>>
Re: Stateful-fun-Basic-Hello
Thanks Gordon, I read the documentation several times. But I didn't understand at that time, flink-conf.yaml is not there. can you please suggest 1. how to increase parallelism 2. how to give checkpoints to the job As far as I know there is no documentation regarding this. or Are these features are not there yet? Cheers, Dinesh.
Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)
HI Jary, The easiest and simple solution is while creating consumer you can pass different config based on your requirements Example : For creating consumer for topic A you can pass config as max.poll.records: “1" max.poll.interval.ms: "1000” For creating consumer for topic B you can pass config as max.poll.records: “3600" max.poll.interval.ms: "1000” 1. But actually your configuration has a flaw when you are giving setStartFromTimestamp. Which means if topic B is generating 3600 events for every second and you put the setStartFromTimestamp to consume data from last 24 hours . Your second consumer will always be lag of one day.(It will never consume the real time data). Which is not we want in streaming. 2. For flink we don't need to pass these settings (max.poll.records, max.poll.interval.ms). Flink will consume the data realtime by the architecture. If your job is consuming data slowly means(back pressure) you have to increase parallelism. there are several ways to increase parallelism (operator level, job level). I hope, I explained it clearly. please let me know if you need further clarifications. Thanks, Dinesh On Mon, May 25, 2020 at 12:34 PM Jary Zhen wrote: > Hi, dinesh , thanks for your reply. > > For example, there are two topics, topic A produces 1 record per second > and topic B produces 3600 records per second. If I set kafka consume > config like this: > max.poll.records: “3600" > max.poll.interval.ms: "1000”) , > which means I can get the whole records by every second from these two > topics in real time. > But , if I want to consume the data from last day or earlier days by > using FlinkKafkaConsumer.setStartFromTimestamp(timestamp). I will get 3600 > records within one second from *topic A* which is produce *in an hour* in > production environment, at the same time, I will get 3600 records within > one second from* topic B* which is produce *in an second. *So By using > *EventTime* semanteme , the watermark assigned from topic A > wil aways let > the data from topic B as ‘late data’ in window operator. What I wanted > is that 1 records from A and 3600 records from B by using > FlinkKafkaConsumer.setStartFromTimestamp(timestamp) so that I can > simulate consume data as in real production environment. > > > Best > > > > > > > > On Sat, 23 May 2020 at 23:42, C DINESH wrote: > >> Hi Jary, >> >> What you mean by step banlence . Could you please provide a concrete >> example >> >> On Fri, May 22, 2020 at 3:46 PM Jary Zhen wrote: >> >>> Hello everyone, >>> >>>First,a brief pipeline introduction: >>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) >>> consume multi kafka topic >>> -> union them >>> -> assignTimestampsAndWatermarks >>> -> keyby >>> -> window() and so on … >>> It's a very normal way use flink to process data like this in production >>> environment. >>> But, If I want to test the pipeline above I need to use the api of >>> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data. >>> So my question is how to control the ’step‘ banlence as one topic >>> produces 3 records per second while another topic produces 3 per second. >>> >>> I don’t know if I describe clearly . so any suspicion please let me know >>> >>> Tks >>> >>>
Re: Query Rest API from IDE during runtime
Hi Annemarie, You need to use http client to connect to the job managaer. //Creating a HttpClient object CloseableHttpClient httpclient = HttpClients.createDefault(); //Creating a HttpGet object HttpGet httpget = new HttpGet("https://${jobmanager:port}/jobs "); //Executing the Get request HttpResponse httpresponse = httpclient.execute(httpget); from httpresponse you will get all the running job details. On Fri, May 22, 2020 at 9:22 PM Annemarie Burger < annemarie.bur...@campus.tu-berlin.de> wrote: > Hi, > > I want to query Flink's REST API in my IDE during runtime in order to get > the jobID of the job that is currently running. Is there any way to do > this? > I found the RestClient class, but can't seem to figure out how to exactly > make this work. Any help much appreciated. > > Best, > Annemarie > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >
stateful-fun2.0 checkpointing
Hi Team, 1. How can we enable checkpointing in stateful-fun2.0 2. How to set parallelism Thanks, Dinesh.
Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)
Hi Jary, What you mean by step banlence . Could you please provide a concrete example On Fri, May 22, 2020 at 3:46 PM Jary Zhen wrote: > Hello everyone, > >First,a brief pipeline introduction: > env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) > consume multi kafka topic > -> union them > -> assignTimestampsAndWatermarks > -> keyby > -> window() and so on … > It's a very normal way use flink to process data like this in production > environment. > But, If I want to test the pipeline above I need to use the api of > FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data. > So my question is how to control the ’step‘ banlence as one topic produces > 3 records per second while another topic produces 3 per second. > > I don’t know if I describe clearly . so any suspicion please let me know > > Tks > >
Stateful-fun-Basic-Hello
Hi Team, I am writing my first stateful fun basic hello application. I am getting the following Exception. $ ./bin/flink run -c org.apache.flink.statefun.flink.core.StatefulFunctionsJob ./stateful-sun-hello-java-1.0-SNAPSHOT-jar-with-dependencies.jar The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Invalid configuration: classloader.parent-first-patterns.additional; Must contain all of org.apache.flink.statefun, org.apache.kafka, com.google.protobuf at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) Caused by: org.apache.flink.statefun.flink.core.exceptions.StatefulFunctionsInvalidConfigException: Invalid configuration: classloader.parent-first-patterns.additional; Must contain all of org.apache.flink.statefun, org.apache.kafka, com.google.protobuf at org.apache.flink.statefun.flink.core.StatefulFunctionsConfigValidator.validateParentFirstClassloaderPatterns(StatefulFunctionsConfigValidator.java:55) at org.apache.flink.statefun.flink.core.StatefulFunctionsConfigValidator.validate(StatefulFunctionsConfigValidator.java:44) at org.apache.flink.statefun.flink.core.StatefulFunctionsConfig.(StatefulFunctionsConfig.java:143) at org.apache.flink.statefun.flink.core.StatefulFunctionsConfig.fromEnvironment(StatefulFunctionsConfig.java:105) This is my POM file I hope I have added all the dependencies. Please suggest me what to do. http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> 4.0.0 org.example stateful-sun-hello-java 1.0-SNAPSHOT com.google.protobuf protobuf-java 3.6.1 org.apache.flink statefun-sdk 2.0.0 org.apache.flink statefun-flink-distribution 2.0.0 org.apache.flink statefun-kafka-io 2.0.0 clean generate-sources compile install com.github.os72 protoc-jar-maven-plugin 3.6.0.1 generate-sources run direct src/main/protobuf java src/main/java grpc-java io.grpc:protoc-gen-grpc-java:1.15.0 src/main/java org.apache.maven.plugins maven-assembly-plugin 2.4.1 jar-with-dependencies org.apache.flink.statefun.flink.core.StatefulFunctionsJob make-assembly package single org.apache.maven.plugins maven-compiler-plugin 3.8.0 1.8 1.8 Thanks, Dinesh
Statefulfun.io
Hi Team, Is streaming ledger is replaced by statefulfun.io. Or am i missing something? Thanks and regards, Dinesh.