Re: Flink - Handling late events - main vs late window firing

2017-08-14 Thread Aljoscha Krettek
I would say so, yes. But I don't consider ProessWindowFunction to be low-level, it's just the function that should be used for processing windows if you need more information about context. Best, Aljoscha > On 14. Aug 2017, at 22:53, M Singh wrote: > > Thanks Aljoscha for your response. > >

Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread Hai Zhou
I would like to ask what is “PB object”? Thanks. Hai Zhou > 在 2017年8月15日,09:53,mingleizhang <18717838...@163.com> 写道: > > Thanks, Nico. I tried flink1.3.2. Works now. Thank you very much! I think > there should be something else to cause this error to happen. Not only the PR > I patched before

Time zones problem

2017-08-14 Thread Alexander Smirnov
Hello everybody, I’m exploring Flink options to build statistics engine for call center solution. One thing I’m not sure how to implement. Currently have the following jobs in the architecture. Job #1 – is for matching start and end events and calculate durations. Like having call started and c

Re:Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread mingleizhang
Thanks, Nico. I tried flink1.3.2. Works now. Thank you very much! I think there should be something else to cause this error to happen. Not only the PR I patched before. Thanks. mingleizhang At 2017-08-15 00:29:28, "Nico Kruber" wrote: >Just to be sure, can you try flink 1.3.2 which is

Automate Job submission with container

2017-08-14 Thread Biswajit Das
Hi There, A few weeks back I have posted here regarding flink docker container running on mesos. I'm able to run the same successfully; I'm heading towards full CI/CD deployment with the marathon, most of our deployment is automated via the marathon, and sometimes even I trigger metrics based auto

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
Hi, Admittedly, I have not suggested this because I thought it was not available for batch API. Regards, Kien On Aug 15, 2017, 00:06, at 00:06, Nico Kruber wrote: >Hi Eranga and Kien, >Flink supports asynchronous IO since version 1.2, see [1] for details. > >You basically pack your URL downlo

Re: Flink - Handling late events - main vs late window firing

2017-08-14 Thread M Singh
Thanks Aljoscha for your response. Just to clarify - the only way to handle the duplication scenario properly is by using the ProcessWindowFunction - there is no high level function for this. Thanks again. On Wednesday, August 9, 2017 6:26 AM, Aljoscha Krettek wrote: Hi, 1. You could

Re: Queryable State max number of clients

2017-08-14 Thread Ziyad Muhammed
Hi Aljoscha, Ufuk Thank you for the replies. I'm using RocksDB state backend. Could you please explain the blocking I/O calls mentioned? when will it happen? And what will be the effect? A timeout exception? Best Ziyad On Mon, Aug 14, 2017 at 10:17 PM, Ufuk Celebi wrote: > This is as Aljosch

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Eranga Heshan
Thanks for your quick replies, Nico and Kien. Since I am using Flink-1.3.0, I will try Nico's idea. I might bug you again for my future problems. 😊 Regards, Eranga Heshan *Undergraduate* Computer Science & Engineering University of Moratuwa Mobile: +94 71 138 2686 <%2B94%2071%20552%202087> Ema

Re: Queryable State max number of clients

2017-08-14 Thread Ufuk Celebi
This is as Aljoscha describes. Each thread can handle many different clients at the same time. You shouldn't need to change the defaults in most cases. The network threads handle the TCP connections and dispatch query tasks to the query threads which do the actual querying of the state backend. In

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Nico Kruber
Hi Eranga and Kien, Flink supports asynchronous IO since version 1.2, see [1] for details. You basically pack your URL download into the asynchronous part and collect the resulting string for further processing in your pipeline. Nico [1] https://ci.apache.org/projects/flink/flink-docs-releas

Re: One large WindowFunction vs. several smaller ones

2017-08-14 Thread Nico Kruber
Hi Maarten, If the Count-WF is counting the number of events per window and the Diff-WF is just comparing this number to the output of the previous window, then you do not need a WindowFunction for the Diff-WF afterall: Just use your Count-WF and plug in a stateful map (also see [1]) afterwards

Re: Aggregation by key hierarchy

2017-08-14 Thread Nico Kruber
Hi Basanth, Let's assume you have records of the form Record = {timestamp, country, state, city, value} Then you'd like to create aggregates, e.g. the average, for the following combinations? 1) avg per country 2) avg per state and country 3) avg per city and state and country * You could create

Flink multithreading, CoFlatMapFunction, CoProcessFunction, internal state

2017-08-14 Thread Chao Wang
Hi, I'd like to know if CoFlatMapFunction/CoProcessFunction is thread-safe, and to what extent? What's the difference between the two Functions? and in general, how does Flink prevent race conditions? Here's my case: I tried to condition on two input streams and produce the third stream if t

Re: kerberos yarn - failure in long running streaming application

2017-08-14 Thread Eron Wright
It sounds to me that the TGT is expiring (usually after 12 hours). This shouldn't happen in the keytab scenario because of a background thread provided by Hadoop that periodically performs a re-login using the keytab. More details on the Hadoop internals here: https://stackoverflow.com/a/346910

Re: IllegalArgumentException when using elasticsearch as a sink

2017-08-14 Thread Nico Kruber
Just to be sure, can you try flink 1.3.2 which is supposed to fix FLINK-7133 and was released recently? Nico On Monday, 14 August 2017 03:19:06 CEST mingleizhang wrote: > BTW, My elastic search version is 2.3.3, not the jira FLINK-7133 by 1.7.1. > And I found 2.3.3 is not based on asm. My flink

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
Hi, While this task is quite trivial to do with Flink Dataset API, using readTextFile to read the input and a flatMap function to perform the downloading, it might not be a good idea. The download process is I/O bound, and will block the synchronous flatMap function, so the throughput will

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Nico Kruber
Hi, have you tried letting your source also implement the StoppableFunction interface as suggested by the SourceFunction javadoc? If a source is stopped, e.g. after identifying some special signal from the outside, it will continue processing all remaining events and the Flink program will shu

Re: Serialization problem: Using generic that extends a class on POJO.

2017-08-14 Thread Timo Walther
Hi Ido, at the first glance, I could not find any problem in your code. So it might be a bug. The "environment.registerType()" is not needed in your case, because you have no generic types. I will have a closer look at it tomorrow. Regards, Timo Am 14.08.17 um 16:35 schrieb Ido Bar Av: Hi

Re: Standalone cluster - taskmanager settings ignored

2017-08-14 Thread Stephan Ewen
The scripts and the masters/slaves files are only relevant to the scripts which SSH to the machines to start/stop the processes. They have not really an impact on how the processes find each other. Calling them repeatedly and editing them can start additional processes, or not stop all processes.

Serialization problem: Using generic that extends a class on POJO.

2017-08-14 Thread Ido Bar Av
Hi, We're using flink 1.3.1, and we're trying to pass through the pipeline a POJO object that has a generic field )see details in the complete example below): We have the class Foo, and when sending a subclass with a specific SomeKey, we get the following exception: java.lang.RuntimeException:

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Just noticed that I forgot to include also a reference to the documentation about externalized checkpoints: https://ci.apache.org/projects/flink/flink-docs-master/ops/state/checkpoints.html > Am 14.08.2017 um 1

Re: [EXTERNAL] difference between checkpoints & savepoints

2017-08-14 Thread Stefan Richter
Hi, > > Also, in the same line, can someone detail the difference between State > Backend & External checkpoint? > Those are two very different things. If we talk about state backends in Flink, we mean the entity that is responsible for storing and managing the state inside an operator. Th

Aggregation on multiple Key combinations and multiple Windows

2017-08-14 Thread Basanth Gowda
Hello, Posted this yesterday, but not sure if it went through or not. I am fairly new to Flink. I have a use case which needs aggregation on different combination of keys and windowing for different intervals. I searched through but couldn't find anything that could help. Came across this model

For wierdos trying to use OpenCV in Flink Stream

2017-08-14 Thread Trevor Grant
I'm new with JNIs / Classloaders /etc, and had a hard time making this work right. Lots of posts referenced the Tomcat solution, but it took me a while to "get it". And so, for newbs like me, here is a quick bloggette on getting OpenCV (and theoretically any JNIs) working in your Flink stream. h

Re: Queryable State max number of clients

2017-08-14 Thread Aljoscha Krettek
Hi, I think the number of network treads and number of query threads only roughly correlate with the number of clients that can query in parallel since this is using asynchronous communication via Akka/Netty. Of course, increasing that number means there can be more connections but I think even

Re: Standalone cluster - taskmanager settings ignored

2017-08-14 Thread Nico Kruber
Hi Marc, the master, i.e. JobManager, does not need to know which clients, i.e. TaskManager, are supposed to connect to it. Indeed, only the task managers need to know where to connect to and they will try to establish that connection and re-connect when losing it. Nico On Friday, 11 August 2

Re: Writing on Cassandra

2017-08-14 Thread Nico Kruber
If I see this correctly in the code, the CassandraSink is using the value of its input stream automatically, so in your case Tuple2> What you want is it to use only Tuple6 without the first first part of Tuple2 (probably your key?). A simple map function before adding the sink should to the tric

Re: How can I cancel a Flink job safely without a special stop message in the stream?

2017-08-14 Thread Zor X.L.
Bump... 在 2017/8/11 9:36, Zor X.L. 写道: Hi, What we want to do is cancelling the Flink job after all upstream data were processed. We use Kafka as our input and output, and use the SQL capability of Table API by the way. A possible solution is: * embed a stop message at the tail of

Re: Flink Data Streaming to S3

2017-08-14 Thread ant burton
Thanks Vinay, After jar xf jar-file the JAR file, and then find . -name "*.xml", I can only locate ./core-default.xml, updating this file does not resolve the issue, am I missing something? Should the core-site.xml be placed in the root of the jar or else where? :-) > On 14 Aug 2017, at 07:3

Re: kerberos yarn - failure in long running streaming application

2017-08-14 Thread Ted Yu
bq. security.kerberos.login.contexts: Client,KafkaClien Just curious: there is missing 't' at the end of the above line. Maybe a typo when composing the email ? On Sun, Aug 13, 2017 at 11:15 PM, Prabhu V wrote: > Hi, > > I am running Flink-1.3.2 on yarn (Cloudera 2.6.0-cdh5.7.6). The > applica