Re: Queryable State max number of clients

2017-08-14 Thread Ufuk Celebi
This is as Aljoscha describes. Each thread can handle many different clients at the same time. You shouldn't need to change the defaults in most cases. The network threads handle the TCP connections and dispatch query tasks to the query threads which do the actual querying of the state backend.

Re: Queryable State max number of clients

2017-08-14 Thread Ziyad Muhammed
Hi Aljoscha, Ufuk Thank you for the replies. I'm using RocksDB state backend. Could you please explain the blocking I/O calls mentioned? when will it happen? And what will be the effect? A timeout exception? Best Ziyad On Mon, Aug 14, 2017 at 10:17 PM, Ufuk Celebi wrote: >

Re: Flink - Handling late events - main vs late window firing

2017-08-14 Thread M Singh
Thanks Aljoscha for your response. Just to clarify - the only way to handle the duplication scenario properly is by using the ProcessWindowFunction - there is no high level function for this. Thanks again. On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek wrote:

Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread Hai Zhou
I would like to ask what is “PB object”? Thanks. Hai Zhou > 在 2017年8月15日,09:53,mingleizhang <18717838...@163.com> 写道: > > Thanks, Nico. I tried flink1.3.2. Works now. Thank you very much! I think > there should be something else to cause this error to happen. Not only the PR > I patched

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
Hi, Admittedly, I have not suggested this because I thought it was not available for batch API. Regards, Kien On Aug 15, 2017, 00:06, at 00:06, Nico Kruber wrote: >Hi Eranga and Kien, >Flink supports asynchronous IO since version 1.2, see [1] for details. > >You

Automate Job submission with container

2017-08-14 Thread Biswajit Das
Hi There, A few weeks back I have posted here regarding flink docker container running on mesos. I'm able to run the same successfully; I'm heading towards full CI/CD deployment with the marathon, most of our deployment is automated via the marathon, and sometimes even I trigger metrics based

Re:Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread mingleizhang
Thanks, Nico. I tried flink1.3.2. Works now. Thank you very much! I think there should be something else to cause this error to happen. Not only the PR I patched before. Thanks. mingleizhang At 2017-08-15 00:29:28, "Nico Kruber" wrote: >Just to be sure, can you

Time zones problem

2017-08-14 Thread Alexander Smirnov
Hello everybody, I’m exploring Flink options to build statistics engine for call center solution. One thing I’m not sure how to implement. Currently have the following jobs in the architecture. Job #1 – is for matching start and end events and calculate durations. Like having call started and

Re: Standalone cluster - taskmanager settings ignored

2017-08-14 Thread Nico Kruber
Hi Marc, the master, i.e. JobManager, does not need to know which clients, i.e. TaskManager, are supposed to connect to it. Indeed, only the task managers need to know where to connect to and they will try to establish that connection and re-connect when losing it. Nico On Friday, 11 August

Re: Error during Kafka connection

2017-08-14 Thread Tzu-Li (Gordon) Tai
Hi, I don’t have experience running Kafka clusters behind proxies, but it seems like the configurations “advertised.host.name” and “advertised.port” for your Kafka brokers are what you’re looking for. For information on that please refer to the Kafka documentations. Cheers, Gordon On 12

Re: Flink Data Streaming to S3

2017-08-14 Thread vinay patil
Hi, Yes, I am able to write to S3 using DataStream API. I have answered you the approach on SO Regards, Vinay Patil On Mon, Aug 14, 2017 at 4:21 AM, ant burton [via Apache Flink User Mailing List archive.] wrote: > Hello, > > Has anybody been able to write

Re: Flink Data Streaming to S3

2017-08-14 Thread ant burton
Thanks Vinay, After jar xf jar-file the JAR file, and then find . -name "*.xml", I can only locate ./core-default.xml, updating this file does not resolve the issue, am I missing something? Should the core-site.xml be placed in the root of the jar or else where? :-) > On 14 Aug 2017, at

Re: Queryable State max number of clients

2017-08-14 Thread Aljoscha Krettek
Hi, I think the number of network treads and number of query threads only roughly correlate with the number of clients that can query in parallel since this is using asynchronous communication via Akka/Netty. Of course, increasing that number means there can be more connections but I think

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Zor X.L.
Bump... 在 2017/8/11 9:36, Zor X.L. 写道: Hi, What we want to do is cancelling the Flink job after all upstream data were processed. We use Kafka as our input and output, and use the SQL capability of Table API by the way. A possible solution is: * embed a stop message at the tail of

Re: Writing on Cassandra

2017-08-14 Thread Nico Kruber
If I see this correctly in the code, the CassandraSink is using the value of its input stream automatically, so in your case Tuple2> What you want is it to use only Tuple6 without the first

kerberos yarn - failure in long running streaming application

2017-08-14 Thread Prabhu V
Hi, I am running Flink-1.3.2 on yarn (Cloudera 2.6.0-cdh5.7.6). The application stream data from kafka, groups by key, creates a session window and writes to HDFS using a rich window function in the "window.apply" method. The rich window function creates the sequence file thus

For wierdos trying to use OpenCV in Flink Stream

2017-08-14 Thread Trevor Grant
I'm new with JNIs / Classloaders /etc, and had a hard time making this work right. Lots of posts referenced the Tomcat solution, but it took me a while to "get it". And so, for newbs like me, here is a quick bloggette on getting OpenCV (and theoretically any JNIs) working in your Flink stream.

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Hi, > > Also, in the same line, can someone detail the difference between State > Backend & External checkpoint? > Those are two very different things. If we talk about state backends in Flink, we mean the entity that is responsible for storing and managing the state inside an operator.

Aggregation on multiple Key combinations and multiple Windows

2017-08-14 Thread Basanth Gowda
Hello, Posted this yesterday, but not sure if it went through or not. I am fairly new to Flink. I have a use case which needs aggregation on different combination of keys and windowing for different intervals. I searched through but couldn't find anything that could help. Came across this model

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Just noticed that I forgot to include also a reference to the documentation about externalized checkpoints: https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html > Am 14.08.2017 um

Re: Serialization problem: Using generic that extends a class on POJO.

2017-08-14 Thread Timo Walther
Hi Ido, at the first glance, I could not find any problem in your code. So it might be a bug. The "environment.registerType()" is not needed in your case, because you have no generic types. I will have a closer look at it tomorrow. Regards, Timo Am 14.08.17 um 16:35 schrieb Ido Bar Av:

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Nico Kruber
Hi, have you tried letting your source also implement the StoppableFunction interface as suggested by the SourceFunction javadoc? If a source is stopped, e.g. after identifying some special signal from the outside, it will continue processing all remaining events and the Flink program will

Serialization problem: Using generic that extends a class on POJO.

2017-08-14 Thread Ido Bar Av
Hi, We're using flink 1.3.1, and we're trying to pass through the pipeline a POJO object that has a generic field )see details in the complete example below): We have the class Foo, and when sending a subclass with a specific SomeKey, we get the following exception:

Re: Standalone cluster - taskmanager settings ignored

2017-08-14 Thread Stephan Ewen
The scripts and the masters/slaves files are only relevant to the scripts which SSH to the machines to start/stop the processes. They have not really an impact on how the processes find each other. Calling them repeatedly and editing them can start additional processes, or not stop all processes.

Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-14 Thread Chao Wang
Hi, I'd like to know if CoFlatMapFunction/CoProcessFunction is thread-safe, and to what extent? What's the difference between the two Functions? and in general, how does Flink prevent race conditions? Here's my case: I tried to condition on two input streams and produce the third stream if

Re: Aggregation by key hierarchy

2017-08-14 Thread Nico Kruber
Hi Basanth, Let's assume you have records of the form Record = {timestamp, country, state, city, value} Then you'd like to create aggregates, e.g. the average, for the following combinations? 1) avg per country 2) avg per state and country 3) avg per city and state and country * You could create

Re: One large WindowFunction vs. several smaller ones

2017-08-14 Thread Nico Kruber
Hi Maarten, If the Count-WF is counting the number of events per window and the Diff-WF is just comparing this number to the output of the previous window, then you do not need a WindowFunction for the Diff-WF afterall: Just use your Count-WF and plug in a stateful map (also see [1]) afterwards

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Nico Kruber
Hi Eranga and Kien, Flink supports asynchronous IO since version 1.2, see [1] for details. You basically pack your URL download into the asynchronous part and collect the resulting string for further processing in your pipeline. Nico [1]

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
Hi, While this task is quite trivial to do with Flink Dataset API, using readTextFile to read the input and a flatMap function to perform the downloading, it might not be a good idea. The download process is I/O bound, and will block the synchronous flatMap function, so the throughput

Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread Nico Kruber
Just to be sure, can you try flink 1.3.2 which is supposed to fix FLINK-7133 and was released recently? Nico On Monday, 14 August 2017 03:19:06 CEST mingleizhang wrote: > BTW, My elastic search version is 2.3.3, not the jira FLINK-7133 by 1.7.1. > And I found 2.3.3 is not based on asm. My

Re: kerberos yarn - failure in long running streaming application

2017-08-14 Thread Eron Wright
It sounds to me that the TGT is expiring (usually after 12 hours). This shouldn't happen in the keytab scenario because of a background thread provided by Hadoop that periodically performs a re-login using the keytab. More details on the Hadoop internals here: