date:20190613

Building flink from source---error of resolving dependencies

2019-06-13 Thread syed

Hi; I am trying to build Apache Flink from the source. I am strictly following the instructions given @github flink page. https://github.com/apache/flink I am facing the error for resolving the dependencies for project flink-mapr-fs. The details of error is as below; [ERROR] Failed to

Flink end to end intergration test

2019-06-13 Thread min.tan

Hi, I am new to Flink, at least to the testing part. We need an end to end integration test for a flink job. Where can I find documentation for this? I am envisaging a test similar to that: 1) Start a local job instance in an IDE or maven test 2) Fire event jsons to the data

Re: Flink end to end intergration test

2019-06-13 Thread Konstantin Knauf

Hi Min, I recently published a small repository [1] containing examples of how to test Flink applications on different levels of the testing pyramid. It also contains one integration test, which spins up an embedded Flink cluster [2]. In contrast to your requirements this test uses dedicated

java.lang.NoClassDefFoundError --- FlinkKafkaConsumer

2019-06-13 Thread syed

hi I am trying to add a kafka source to the standard WordCount application available with flink, So that the application may count words from the kafka source. I added the source like this *DataStream text; if(params.has("topic")&("bootstrap.servers")&("zookeeper.connect") &("group.id")) {

Re: java.lang.NoClassDefFoundError --- FlinkKafkaConsumer

2019-06-13 Thread Yun Tang

Hi Syed Have you ever build your user application jar package with all dependencies as [1] suggested or copy related connector jar package from flink-home/opt/ folder to flink-home/lib/ folder? [1]

Re: Latency Monitoring in Flink application

2019-06-13 Thread Konstantin Knauf

Hi Roey, with Latency Tracking you will get a distribution of the time it took for LatencyMarkers to travel from each source operator to each downstream operator (per default one histogram per source operator in each non-source operator, see metrics.latency.granularity). LatencyMarkers are

Re: Latency Monitoring in Flink application

2019-06-13 Thread Timothy Victor

Thanks for the insight. I was also interested in this topic. One thought occurred to me is what about the queuing delay when sending to your message bus (e.g. kafka). I am guessing the probe will be before the message is added to the send queue? Thanks again Tim On Thu, Jun 13, 2019, 6:08

Latency Monitoring in Flink application

2019-06-13 Thread Halfon, Roey

Hi All, I'm looking for help regarding latency monitoring. Let's say I have a simple streaming data flow with the following operators: FlinkKafkaConsumer -> Map -> print. In case I want to measure a latency of records processing in my dataflow, what would be the best opportunity? I want to get

Re: What happens when: high-availability.storageDir: is not available?

2019-06-13 Thread John Smith

Thanks. Does this folder need to available for task nodes as well? Or just job nodes? On Wed., Jun. 12, 2019, 11:56 p.m. Yun Tang, wrote: > High availability storage directory would store completed checkpoint and > submitted job graph and completed checkpoint. If this directory is > unavailable

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-13 Thread Felipe Gutierrez

Hi Hequn, indeed the ReduceFunction is better than the ProcessWindowFunction. I replaced and could check the improvement performance [1]. Thanks for that! I will try a distinct count with the Table API. The question that I am facing is that I want to use a HyperLogLog on a UDF for DataStream.

Re: Loading state from sink/understanding recovery options

2019-06-13 Thread Eduardo Winpenny Tejedor

Is it possible someone could comment on this question in either direction please? Thanks, Eduardo On Sat, 8 Jun 2019, 14:10 Eduardo Winpenny Tejedor, < eduardo.winpe...@gmail.com> wrote: > Hi, > > In generic terms, if a keyed operator outputs its state into a sink, and > the state of that

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-13 Thread Felipe Gutierrez

humm.. it seems that it is my turn to implement all this stuff using Table API. Thanks Rong! *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Thu, Jun 13, 2019 at 6:00 PM Rong Rong wrote: > Hi

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-13 Thread Rong Rong

Hi Felipe, Hequn is right. The problem you are facing is better using TableAPI level code instead of dealing with in DataStream. You will have more Flink library support to achieve your goal. In addition, Flink TableAPI also support UserDefineAggregateFunction [1] to achieve your hyperLogLog

Re: Flink end to end intergration test

2019-06-13 Thread Theo Diefenthal

Hi Min, In order to run "clean" integration tests with Kafka, I setup a JUnit rule for buidling up kafka (as mentioned by konstantin), but I also use my own KafkaDeserializer (By extending from my custom deserializer for the project) like so public class TestDeserializer extends

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-13 Thread Vishal Santoshi

I guess, adjusting the pattern ( blacklisting the topic/s ) would work On Thu, Jun 13, 2019 at 3:02 PM Vishal Santoshi wrote: > Given > > >

Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

2019-06-13 Thread Vishal Santoshi

Given https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L471 it seems that if * I have a regex pattern for consuming a bunch of topics * auto create is turned on then

Re: What happens when: high-availability.storageDir: is not available?

2019-06-13 Thread Yun Tang

Except job graph and completed checkpoint, high availability storage directory would also store blob data which would be accessed from both jobmanager and taskmanager nodes, you could refer to [1] to view the BLOB storage architecture. [1]

Re: Source code question - about the logic of calculating network buffer

2019-06-13 Thread 徐涛

Hi Yun, Thanks a lot for the detailed and clear explanation, that is very helpful. Best Henry > 在 2019年6月13日，上午10:32，Yun Gao 写道： > > Hi tao, > > As a whole, `networkBufBytes` is not part of the heap. In fact, it is > allocated from the direct memory. The rough relationship

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-13 Thread Hequn Cheng

Hi Felipe, >From your code, I think you want to get the "count distinct" result instead of the "distinct count". They contain a different meaning. To improve the performance, you can replace your DistinctProcessWindowFunction to a DistinctProcessReduceFunction. A ReduceFunction can aggregate the

Re: Loading state from sink/understanding recovery options

2019-06-13 Thread Congxian Qiu

Hi, Eduardo Currently, we can't load state from the outside(there is an ongoing jira[1] to do this), in the other word, if you disable checkpoint, and use the Kafka/database as your state storage, you should do the deduplication things by yourself. Just curious, which state backend do you use,

Re:flink

2019-06-13 Thread 王志明

Hi，你好：如果这些delay的数据在窗口计算之后才收到，那么会被Flink丢弃，但是你可以借助 allowedlateness 和 Side Output 把这部分delay的数据筛选出来，然后在进行处理。在 2019-05-13 15:35:33，"Kobeli" 写道： >hello flink watermark的问题 >flink job使用event time聚合指标， source是多分区的kafka,如果一个分区的数据由于某个原因(比如under-replica), >flink

Building flink from source---error of resolving dependencies

Flink end to end intergration test

Re: Flink end to end intergration test

java.lang.NoClassDefFoundError --- FlinkKafkaConsumer

Re: java.lang.NoClassDefFoundError --- FlinkKafkaConsumer

Re: Latency Monitoring in Flink application

Re: Latency Monitoring in Flink application

Latency Monitoring in Flink application

Re: What happens when: high-availability.storageDir: is not available?

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Re: Loading state from sink/understanding recovery options

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Re: Flink end to end intergration test

Re: Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

Deletion of topic from a pattern in KafkaConsumer if auto-create is true will always recreate the topic

Re: What happens when: high-availability.storageDir: is not available?

Re: Source code question - about the logic of calculating network buffer

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

Re: Loading state from sink/understanding recovery options

Re:flink

21 matches

Site Navigation

Mail list logo

Footer information