Re: Hadoop_Compatability

2020-08-10 Thread C DINESH
Thanks for the response Chesnay,

i will try to understand it. If I have doubts I will get back.

Thanks & Regards,
Dinesh.

On Thu, Aug 6, 2020 at 5:11 PM Chesnay Schepler  wrote:

> We still offer a flink-shaded-hadoop-2 artifact that you can find on the
> download page:
> https://flink.apache.org/downloads.html#additional-components
> In 1.9 we changed the artifact name.
>
> Note that we will not release newer versions of this dependency.
>
> As for providing Hadoop class, there is some guidance in the documentation
> <https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/hadoop.html#providing-hadoop-classes>
> .
> If this does not work fr your, and you want to put the required
> dependencies into the lib/ directory, then you will have to look through
> your dependency tree to determine which Hadoop dependencies are required.
> The Flink hadoop-compatibility dependency requires hadoop-common and
> hadoop-mapreduce-client-core, and whatever transitive dependencies these
> have.
>
> On 06/08/2020 13:08, C DINESH wrote:
>
> Hi All,
>
> From 1.9 version there is no *flink-shaded-hadoop2 dependency. To use
> Hadoop APIS like *IntWritable ,  LongWritable. What are the dependencies
> we need to add to use these APIs.
>
> I tried searching in google. Not able to understand the solution. Please
> guide me.
>
> Thanks in Advance.
> Dinesh.
>
>
>


Hadoop_Compatability

2020-08-06 Thread C DINESH
Hi All,

>From 1.9 version there is no *flink-shaded-hadoop2 dependency. To use
Hadoop APIS like *IntWritable ,  LongWritable. What are the dependencies we
need to add to use these APIs.

I tried searching in google. Not able to understand the solution. Please
guide me.

Thanks in Advance.
Dinesh.


DashBoard Name

2020-07-19 Thread C DINESH
Hi All,

In flink UI the name of the dashboard is Apache Flink dash Board.

we have different environments. If I want to change the name of the dash
board. Where do i need to change it?


Thanks & Regards,
Dinesh.


ElasticSearch_Sink

2020-07-15 Thread C DINESH
Hello All,

Can we implement 2 Phase Commit Protocol for elastic search sink. Will
there be any limitations?

Thanks in advance.

Warm regards,
Dinesh.


Mongodb_sink

2020-07-14 Thread C DINESH
Hello all,

Can we implement TwoPhaseCommitProtocol for mongodb to get EXACTLY_ONCE
semantics. Will there be any limitation for it?


Thanks,
Dinesh.


Re: Dynamic source and sink.

2020-07-03 Thread C DINESH
Hi paul,

Thanks for the response.

Can you point out an example of how to create a dynamic client or wrapper
operator.


Thanks and Regards,
Dinesh.




On Thu, Jul 2, 2020 at 12:28 PM Paul Lam  wrote:

> Hi Doinesh,
>
> I think the problem you meet is quite common.
>
> But with the current Flink architecture, operators must be determined at
> compile time (when you submit your job). This is by design IIUC.
> Suppose the operators are changeable, then Flink would need to go through
> the compile-optimize-schedule phases once a new operator
> is added. That would be little difference with restarting a job.
>
> I see two alternative solutions, FYI:
>
> 1. Implement a custom sink function as Danny suggested. The sink function
> dynamically creates a new client for the respective ES cluster
>  on receiving a new tenant configuration.
> 2. Still restart the job, and optimize the downtime by using session mode.
>
> Best,
> Paul Lam
>
> 2020年7月2日 11:23,C DINESH  写道:
>
> Hi Danny,
>
> Thanks for the response.
>
> In short without restarting we cannot add new sinks or sources.
>
> For better understanding I will explain my problem more clearly.
>
> My scenario is I have two topics, one is configuration topic and second
> one is event activities.
>
> * In the configuration topic I will get details of the kafka cluster
> details and elasticsearch cluster details.
> * In the event activities i will get events and each event will have a
> tenantId
> * Suppose now we are getting a new tenantId data I need to send data to
> the respective elastic search cluster which I will come to know in the
> runtime from the configuration topic.
> * Is there a way to add a new elastic search sink in the same job without
> restarting.
>
> Before starting a job I can create two elastic search sinks and with a
> condition I can route the data to the respective elastic search cluster. Is
> there a way to do it in runtime?
>
>
> Thanks and Regards,
> Doinesh.
>
>
> On Wed, Jul 1, 2020 at 5:24 PM Danny Chan  wrote:
>
>> Sorry, a job graph is solid while we compile it before submitting to the
>> cluster, not dynamic as what you want.
>>
>> You did can write some wrapper operators which response to your own PRCs
>> to run the appended operators you want,
>> But the you should keep the consistency semantics by yourself.
>>
>> Best,
>> Danny Chan
>> 在 2020年6月28日 +0800 PM3:30,C DINESH ,写道:
>>
>> Hi All,
>>
>> In a flink job I have a pipeline. It is consuming data from one kafka
>> topic and storing data to Elastic search cluster.
>>
>> without restarting the job can we add another kafka cluster and another
>> elastic search sink to the job. Which means i will supply the new kafka
>> cluster and elastic search details in the topic.  After consuming the data
>> can our flink job add the new source and sink to the same job.
>>
>>
>> Thanks & Regards,
>> Dinesh.
>>
>>
>


Re: Dynamic source and sink.

2020-07-01 Thread C DINESH
Hi Danny,

Thanks for the response.

In short without restarting we cannot add new sinks or sources.

For better understanding I will explain my problem more clearly.

My scenario is I have two topics, one is configuration topic and second one
is event activities.

* In the configuration topic I will get details of the kafka cluster
details and elasticsearch cluster details.
* In the event activities i will get events and each event will have a
tenantId
* Suppose now we are getting a new tenantId data I need to send data to the
respective elastic search cluster which I will come to know in the runtime
from the configuration topic.
* Is there a way to add a new elastic search sink in the same job without
restarting.

Before starting a job I can create two elastic search sinks and with a
condition I can route the data to the respective elastic search cluster. Is
there a way to do it in runtime?


Thanks and Regards,
Doinesh.


On Wed, Jul 1, 2020 at 5:24 PM Danny Chan  wrote:

> Sorry, a job graph is solid while we compile it before submitting to the
> cluster, not dynamic as what you want.
>
> You did can write some wrapper operators which response to your own PRCs
> to run the appended operators you want,
> But the you should keep the consistency semantics by yourself.
>
> Best,
> Danny Chan
> 在 2020年6月28日 +0800 PM3:30,C DINESH ,写道:
>
> Hi All,
>
> In a flink job I have a pipeline. It is consuming data from one kafka
> topic and storing data to Elastic search cluster.
>
> without restarting the job can we add another kafka cluster and another
> elastic search sink to the job. Which means i will supply the new kafka
> cluster and elastic search details in the topic.  After consuming the data
> can our flink job add the new source and sink to the same job.
>
>
> Thanks & Regards,
> Dinesh.
>
>


Dynamic source and sink.

2020-06-28 Thread C DINESH
Hi All,

In a flink job I have a pipeline. It is consuming data from one kafka topic
and storing data to Elastic search cluster.

without restarting the job can we add another kafka cluster and another
elastic search sink to the job. Which means i will supply the new kafka
cluster and elastic search details in the topic.  After consuming the data
can our flink job add the new source and sink to the same job.


Thanks & Regards,
Dinesh.


Re: Stateful-fun-Basic-Hello

2020-05-25 Thread C DINESH
Hi Team,

I mean to say that know I understood. but in the documentation page
flink-conf.yaml is not mentioned

On Mon, May 25, 2020 at 7:18 PM C DINESH  wrote:

> Thanks Gordon,
>
> I read the documentation several times. But I didn't understand at that
> time, flink-conf.yaml is not there.
>
> can you please suggest
> 1. how to increase parallelism
> 2. how to give checkpoints to the job
>
> As far as I know there is no documentation regarding this. or Are these
> features are not there yet?
>
> Cheers,
> Dinesh.
>


Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)

2020-05-25 Thread C DINESH
HI Jary,

My first point is wrong. If we give these settings also flink will consume
the whole data from the last one day.That is what we want right?

late data is defined by window length and water marks strategy. Are you
combining your streams .please provide these details so that we can
understand the problem more clearly.

Thanks,
Dinesh.



On Mon, May 25, 2020 at 6:58 PM C DINESH  wrote:

> HI Jary,
>
> The easiest and simple solution is while creating consumer you can pass
> different config based on your requirements
>
> Example :
>
>  For creating consumer for topic A you can pass config as
> max.poll.records: “1"
> max.poll.interval.ms: "1000”
>
> For creating consumer for topic B you can pass config as
> max.poll.records: “3600"
> max.poll.interval.ms: "1000”
>
>
> 1. But actually your configuration has a flaw when you are giving
> setStartFromTimestamp. Which means if topic B is generating 3600 events
> for every second and you put the setStartFromTimestamp to consume data from
> last 24 hours . Your second consumer will always be lag of one day.(It will
> never consume the real time data). Which is not we want in streaming.
>
> 2. For flink we don't need to pass these settings (max.poll.records,
> max.poll.interval.ms). Flink will consume the data realtime by the
> architecture. If your job is consuming data slowly means(back pressure) you
> have to increase parallelism. there are several ways to increase
> parallelism (operator level, job level).
>
>
> I hope, I explained it clearly.  please let me know if you need further
> clarifications.
>
> Thanks,
> Dinesh
>
>
> On Mon, May 25, 2020 at 12:34 PM Jary Zhen  wrote:
>
>> Hi, dinesh , thanks for your reply.
>>
>>   For example, there are two topics, topic A produces 1 record per second
>> and topic B produces 3600 records per second. If I set kafka consume
>> config like this:
>>  max.poll.records: “3600"
>>  max.poll.interval.ms: "1000”) ,
>> which means I can get the whole records by every second from these two
>> topics in real time.
>> But , if  I want to consume the data from last day or earlier days by
>> using FlinkKafkaConsumer.setStartFromTimestamp(timestamp). I will get 3600
>> records within one second from *topic A* which is produce *in an hour* in
>> production environment, at the same time, I will get 3600 records within
>> one second from* topic B* which is produce *in an second. *So By using
>> *EventTime* semanteme , the watermark assigned from topic A
>>  wil aways let
>> the data from topic B as ‘late data’ in window operator.  What I wanted
>> is that 1 records from A and 3600 records from B by using
>> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) so that I can
>> simulate consume data as in real production environment.
>>
>>
>> Best
>>
>>
>>
>>
>>
>>
>>
>> On Sat, 23 May 2020 at 23:42, C DINESH  wrote:
>>
>>> Hi Jary,
>>>
>>> What you mean by step banlence . Could you please provide a concrete
>>> example
>>>
>>> On Fri, May 22, 2020 at 3:46 PM Jary Zhen  wrote:
>>>
>>>> Hello everyone,
>>>>
>>>>First,a brief pipeline introduction:
>>>>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>>>>   consume multi kafka topic
>>>>   -> union them
>>>>   -> assignTimestampsAndWatermarks
>>>>   -> keyby
>>>>   -> window()  and so on …
>>>> It's a very normal way use flink to process data like this in
>>>> production environment.
>>>> But,  If I want to test the pipeline above I need to use the api of
>>>> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data.
>>>> So my question is how to control the ’step‘ banlence as one topic
>>>> produces 3 records per second while another topic produces 3 per 
>>>> second.
>>>>
>>>> I don’t know if I describe clearly . so any suspicion please let me know
>>>>
>>>> Tks
>>>>
>>>>


Re: Stateful-fun-Basic-Hello

2020-05-25 Thread C DINESH
Thanks Gordon,

I read the documentation several times. But I didn't understand at that
time, flink-conf.yaml is not there.

can you please suggest
1. how to increase parallelism
2. how to give checkpoints to the job

As far as I know there is no documentation regarding this. or Are these
features are not there yet?

Cheers,
Dinesh.


Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)

2020-05-25 Thread C DINESH
HI Jary,

The easiest and simple solution is while creating consumer you can pass
different config based on your requirements

Example :

 For creating consumer for topic A you can pass config as
max.poll.records: “1"
max.poll.interval.ms: "1000”

For creating consumer for topic B you can pass config as
max.poll.records: “3600"
max.poll.interval.ms: "1000”


1. But actually your configuration has a flaw when you are giving
setStartFromTimestamp. Which means if topic B is generating 3600 events
for every second and you put the setStartFromTimestamp to consume data from
last 24 hours . Your second consumer will always be lag of one day.(It will
never consume the real time data). Which is not we want in streaming.

2. For flink we don't need to pass these settings (max.poll.records,
max.poll.interval.ms). Flink will consume the data realtime by the
architecture. If your job is consuming data slowly means(back pressure) you
have to increase parallelism. there are several ways to increase
parallelism (operator level, job level).


I hope, I explained it clearly.  please let me know if you need further
clarifications.

Thanks,
Dinesh


On Mon, May 25, 2020 at 12:34 PM Jary Zhen  wrote:

> Hi, dinesh , thanks for your reply.
>
>   For example, there are two topics, topic A produces 1 record per second
> and topic B produces 3600 records per second. If I set kafka consume
> config like this:
>  max.poll.records: “3600"
>  max.poll.interval.ms: "1000”) ,
> which means I can get the whole records by every second from these two
> topics in real time.
> But , if  I want to consume the data from last day or earlier days by
> using FlinkKafkaConsumer.setStartFromTimestamp(timestamp). I will get 3600
> records within one second from *topic A* which is produce *in an hour* in
> production environment, at the same time, I will get 3600 records within
> one second from* topic B* which is produce *in an second. *So By using
> *EventTime* semanteme , the watermark assigned from topic A
>  wil aways let
> the data from topic B as ‘late data’ in window operator.  What I wanted
> is that 1 records from A and 3600 records from B by using
> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) so that I can
> simulate consume data as in real production environment.
>
>
> Best
>
>
>
>
>
>
>
> On Sat, 23 May 2020 at 23:42, C DINESH  wrote:
>
>> Hi Jary,
>>
>> What you mean by step banlence . Could you please provide a concrete
>> example
>>
>> On Fri, May 22, 2020 at 3:46 PM Jary Zhen  wrote:
>>
>>> Hello everyone,
>>>
>>>First,a brief pipeline introduction:
>>>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>>>   consume multi kafka topic
>>>   -> union them
>>>   -> assignTimestampsAndWatermarks
>>>   -> keyby
>>>   -> window()  and so on …
>>> It's a very normal way use flink to process data like this in production
>>> environment.
>>> But,  If I want to test the pipeline above I need to use the api of
>>> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data.
>>> So my question is how to control the ’step‘ banlence as one topic
>>> produces 3 records per second while another topic produces 3 per second.
>>>
>>> I don’t know if I describe clearly . so any suspicion please let me know
>>>
>>> Tks
>>>
>>>


Re: Query Rest API from IDE during runtime

2020-05-23 Thread C DINESH
Hi Annemarie,

You need to use http client to connect to the job managaer.



 //Creating a HttpClient object
  CloseableHttpClient httpclient = HttpClients.createDefault();

  //Creating a HttpGet object
  HttpGet httpget = new HttpGet("https://${jobmanager:port}/jobs ");

  //Executing the Get request
  HttpResponse httpresponse = httpclient.execute(httpget);


from httpresponse you will get all the running job details.


On Fri, May 22, 2020 at 9:22 PM Annemarie Burger <
annemarie.bur...@campus.tu-berlin.de> wrote:

> Hi,
>
> I want to query Flink's REST API in my IDE during runtime in order to get
> the jobID of the job that is currently running. Is there any way to do
> this?
> I found the RestClient class, but can't seem to figure out how to exactly
> make this work. Any help much appreciated.
>
> Best,
> Annemarie
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


stateful-fun2.0 checkpointing

2020-05-23 Thread C DINESH
Hi Team,

1. How can we enable checkpointing in stateful-fun2.0
2. How to set parallelism

Thanks,
Dinesh.


Re: How to control the banlence by using FlinkKafkaConsumer.setStartFromTimestamp(timestamp)

2020-05-23 Thread C DINESH
Hi Jary,

What you mean by step banlence . Could you please provide a concrete example

On Fri, May 22, 2020 at 3:46 PM Jary Zhen  wrote:

> Hello everyone,
>
>First,a brief pipeline introduction:
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   consume multi kafka topic
>   -> union them
>   -> assignTimestampsAndWatermarks
>   -> keyby
>   -> window()  and so on …
> It's a very normal way use flink to process data like this in production
> environment.
> But,  If I want to test the pipeline above I need to use the api of
> FlinkKafkaConsumer.setStartFromTimestamp(timestamp) to comsume 'past’ data.
> So my question is how to control the ’step‘ banlence as one topic produces
> 3 records per second while another topic produces 3 per second.
>
> I don’t know if I describe clearly . so any suspicion please let me know
>
> Tks
>
>


Stateful-fun-Basic-Hello

2020-05-23 Thread C DINESH
Hi Team,

I am writing my first stateful fun basic hello application. I am getting
the following Exception.

$ ./bin/flink run -c
org.apache.flink.statefun.flink.core.StatefulFunctionsJob
./stateful-sun-hello-java-1.0-SNAPSHOT-jar-with-dependencies.jar





 The program finished with the following exception:


org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: Invalid configuration:
classloader.parent-first-patterns.additional; Must contain all of
org.apache.flink.statefun, org.apache.kafka, com.google.protobuf

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)

at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)

at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)

at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)

at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)

at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)

at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)

at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)

Caused by:
org.apache.flink.statefun.flink.core.exceptions.StatefulFunctionsInvalidConfigException:
Invalid configuration: classloader.parent-first-patterns.additional; Must
contain all of org.apache.flink.statefun, org.apache.kafka,
com.google.protobuf

at
org.apache.flink.statefun.flink.core.StatefulFunctionsConfigValidator.validateParentFirstClassloaderPatterns(StatefulFunctionsConfigValidator.java:55)

at
org.apache.flink.statefun.flink.core.StatefulFunctionsConfigValidator.validate(StatefulFunctionsConfigValidator.java:44)

at
org.apache.flink.statefun.flink.core.StatefulFunctionsConfig.(StatefulFunctionsConfig.java:143)

at
org.apache.flink.statefun.flink.core.StatefulFunctionsConfig.fromEnvironment(StatefulFunctionsConfig.java:105)

This is my POM file I hope I have added all the dependencies. Please
suggest me what to do.


http://maven.apache.org/POM/4.0.0;
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
4.0.0

org.example
stateful-sun-hello-java
1.0-SNAPSHOT



com.google.protobuf
protobuf-java
3.6.1


org.apache.flink
statefun-sdk
2.0.0


org.apache.flink
statefun-flink-distribution
2.0.0


org.apache.flink
statefun-kafka-io
2.0.0






clean generate-sources compile install




com.github.os72
protoc-jar-maven-plugin
3.6.0.1


generate-sources

run


direct


src/main/protobuf




java

src/main/java


grpc-java

io.grpc:protoc-gen-grpc-java:1.15.0

src/main/java








org.apache.maven.plugins
maven-assembly-plugin
2.4.1



jar-with-dependencies





org.apache.flink.statefun.flink.core.StatefulFunctionsJob






make-assembly

package

single





org.apache.maven.plugins
maven-compiler-plugin
3.8.0

1.8
1.8








Thanks,
Dinesh


Statefulfun.io

2020-05-19 Thread C DINESH
Hi Team,

Is streaming ledger is replaced by statefulfun.io. Or am i missing
something?

Thanks and regards,
Dinesh.