[Question] enable end2end Kafka exactly once processing

2020-03-01 文章 Jin Yi
Hi experts, My application is using Apache Beam and with Flink to be the runner. My source and sink are kafka topics, and I am using KafkaIO connector provided by Apache Beam to consume and publish. I am reading through Beam's java doc: https://beam.apache.org/releases/javadoc/2.16.0/org/apache/b

Re: Apache Beam Side input vs Flink Broadcast Stream

2020-02-28 文章 Jin Yi
m+API > > On Thu, Feb 27, 2020 at 6:46 AM Jin Yi wrote: > >> Hi All, >> >> there is a recent published article in the flink official website for >> running beam on top of flink >> https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of

Apache Beam Side input vs Flink Broadcast Stream

2020-02-26 文章 Jin Yi
Hi All, there is a recent published article in the flink official website for running beam on top of flink https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html In the article: - You get additional features like side inputs and cross-language pipeline

Re: How does Flink manage the kafka offset

2020-02-20 文章 Jin Yi
this consume group has joined, and it will >> rebalance the partition consumption if needed. > > > No. Flink does not rebalance the partitions when new task managers joined > cluster. It only did so when job restarts and job parallelism changes. > > Hope it helps. > >

How does Flink manage the kafka offset

2020-02-20 文章 Jin Yi
Hi there, We are running apache beam application with flink being the runner. We use the KafkaIO connector to read from topics: https://beam.apache.org/releases/javadoc/2.19.0/ and we have the following configuration, which enables auto commit of offset, no checkpointing is enabled, and it is pe

[Quesetion] how to havee additional Logging in Apache Beam KafkaWriter

2020-02-18 文章 Jin Yi
Hi there, I am using Apache Beam (v2.16) in my application, and the Runner is Flink(1.8). I use KafkaIO connector to consume from source topics and publish to sink topics. Here is the class that Apache Beam provides for publishing messages. https://github.com/apache/beam/blob/master/sdks/java/io/

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-27 文章 Jin Yi
.html#why-dataset-api > [2] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html#provided-apis > [3] > https://github.com/apache/flink/blob/53f956fb57dd5601d2e3ca9289207d21796cdc4d/flink-streaming-java/src/main/java/org/apache/flink/streaming/api

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 文章 Jin Yi
: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#checkpointedfunction Thanks a lot for your help! Eleanore On Sun, Jan 26, 2020 at 6:53 PM Jin Yi wrote: > Hi Yun, > > Thanks for the response, I have checked official document, and I have > referred thi

Re: [State Processor API] how to convert savepoint back to broadcast state

2020-01-26 文章 Jin Yi
-docs-stable/dev/libs/state_processor_api.html#broadcast-state-1 > > Best > Yun Tang > ------ > *From:* Jin Yi > *Sent:* Thursday, January 23, 2020 8:12 > *To:* user ; user-zh@flink.apache.org < > user-zh@flink.apache.org> > *Subject:* [State Processor A

[State Processor API] how to convert savepoint back to broadcast state

2020-01-22 文章 Jin Yi
Hi there, I would like to read the savepoints (for broadcast state) back into the broadcast state, how should I do it? // load the existingSavepoint; ExistingSavepoint existingSavepoint = Savepoint.load(environment, "file:///tmp/new_savepoints", new MemoryStateBackend()); // read state from exis

Re: Question regarding checkpoint/savepoint and State Processor API

2020-01-21 文章 Jin Yi
t and stop of your job. It is just to bootstrap the > initial state of your application. After that, you will use savepoints to > carry over the current state of your applications between runs. > > > > On Mon, Jan 20, 2020 at 6:07 PM Jin Yi wrote: > >> Hi there, >> &g

Question regarding checkpoint/savepoint and State Processor API

2020-01-20 文章 Jin Yi
Hi there, 1. in my job, I have a broadcast stream, initially there is no savepoint can be used as bootstrap values for the broadcast stream states. BootstrapTransformation transform = OperatorTransformation.bootstrapWith(dataSet).transform(bootstrapFunction); Savepoint.create(new MemoryStateBac

Re: Filter with large key set

2020-01-20 文章 Jin Yi
eInformation, > serializer, and comparator yourself. The Either classes should give you > good guidance for that. > 2) have different operators and flows for each basic data type. This will > fan out your job, but should be the easier approach. > > Best, Fabian > > > > Am D

Filter with large key set

2020-01-15 文章 Jin Yi
Hi there, I have the following usecase: a key set say [A,B,C,] with around 10M entries, the type of the entries can be one of the types in BasicTypeInfo, e.g. String, Long, Integer etc... and each message looks like below: message: { header: A body: {} } I would like to use Flink to fi

Re: MiniCluster问题

2020-01-15 文章 Jin Yi
Hi 可以参考org.apache.flink.streaming.api.environment.LocalStreamEnvironment:: execute public JobExecutionResult execute(String jobName) throws Exception { // transform the streaming program into a JobGraph StreamGraph streamGraph = getStreamGraph(); streamGraph.setJobName(jobName); JobGraph

Re: Fail to deploy flink on k8s in minikube

2020-01-15 文章 Jin Yi
Hi Jary, >From the Flink Website, it supports Flink Job Cluster deployment strategy on Kubernetes: https://github.com/apache/flink/blob/release-1.9/flink-container/kubernetes/README.md#deploy-flink-job-cluster Best Eleanore On Wed, Jan 15, 2020 at 3:18 AM Jary Zhen wrote: > Thanks to YangWang