Re: windowAll and AggregateFunction

2019-01-09 Thread CPC
Hi Ken, I am doing a global distinct. What i want to achive is someting like below. With windowAll it sends all data to single operator which means shuffle all data and calculate with par 1. I dont want to shuffle data since i just want to feed it to hll instance and shuffle just hll instances at

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
Hi Zhijiang, Thanks for your reply. I first also suspect the same reason. But once I read the connected host log, the netty server starts to listen on the correct port after 2 seconds of task manager start. I compared the log of the connected host and connecting host log, it seems requesting

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
Hi Till, I will try the local test according to your suggestion. Uber Flink version is mainly adding something to integrate with Uber deployment and other infra components. There is no change for Flink original code flow. I also found that the issue can be avoided with the same setting in other

Custom Serializer for Avro GenericRecord

2019-01-09 Thread Gagan Agrawal
Hi, I am using Avro GenericRecord for most of IN/OUT types from my custom functions. What I have noticed is that default Avro GenericRecord serializer, also serializes Schema which makes messages very heavy and hence impacts overall performance. In my case I already know the schema before hand

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread zhijiang
Hi Wenrui, I suspect another issue which might cause connection failure. You can check whether the netty server already binds and listens port successfully in time before the client requests connection. If there exists some time-consuming process during TM startup which might delay netty

Multiple MapState vs single nested MapState in stateful Operator

2019-01-09 Thread Gagan Agrawal
Hi, I have a use case where 4 streams get merged (union) and grouped on common key (keyBy) and a custom KeyedProcessFunction is called. Now I need to keep state (RocksDB backend) for all 4 streams in my custom KeyedProcessFunction where each of these 4 streams would be stored as map. So I have 2

Re: windowAll and AggregateFunction

2019-01-09 Thread Ken Krugler
> On Jan 9, 2019, at 3:10 PM, CPC wrote: > > Hi Ken, > > From regular time-based windows do you mean keyed windows? Correct. Without doing a keyBy() you would have a parallelism of 1. I think you want to key on whatever you’re counting for unique values, so that each window operator gets a

Producing binary Avro to Kafka

2019-01-09 Thread Elliot West
Hello, What is the recommended flink streaming approach for serialising a POJO to Avro according to a schema, and pushing the subsequent byte array into a Kafka sink? Also, is there any existing approach for prepending the schema id to the payload (following the Confluent pattern)? Thanks,

Re: windowAll and AggregateFunction

2019-01-09 Thread Ken Krugler
Hi there, You should be able to use a regular time-based window(), and emit the HyperLogLog binary data as your result, which then would get merged in your custom function (which you set a parallelism of 1 on). Note that if you are generating unique counts per non-overlapping time window,

Re: NoClassDefFoundError javax.xml.bind.DatatypeConverterImpl

2019-01-09 Thread Mike Mintz
For what it's worth, we believe are able to work around this issue by adding the following line to our flink-conf.yaml: classloader.parent-first-patterns.additional: javax.xml.;org.apache.xerces. On Thu, Dec 6, 2018 at 2:28 AM Chesnay Schepler wrote: > Small correction: Flink 1.7 does not

Re: windowAll and AggregateFunction

2019-01-09 Thread CPC
Hi Stefan, Could i use "Reinterpreting a pre-partitioned data stream as keyed stream" feature for this? On Wed, 9 Jan 2019 at 17:50, Stefan Richter wrote: > Hi, > > I think your expectation about windowAll is wrong, from the method > documentation: “Note: This operation is inherently

RE: Kerberos error when restoring from HDFS backend after 24 hours

2019-01-09 Thread LINZ, Arnaud
Hi, I've managed to correct this by implementing my own FsStateBackend based on the original one with proper Kerberos relogin in createCheckpointStorage(). Regards, Arnaud -Message d'origine- De : LINZ, Arnaud Envoyé : vendredi 4 janvier 2019 11:32 À : user Objet : Kerberos error when

Re: Can checkpoints be used to migrate jobs between Flink versions ?

2019-01-09 Thread Edward Alexander Rojas Clavijo
Thanks very much for you rapid answer Stefan. Regards, Edward El mié., 9 ene. 2019 a las 15:26, Stefan Richter () escribió: > Hi, > > I would assume that this should currently work because the format of basic > savepoints and checkpoints is the same right now. The restriction in the > doc is

Re: Flink Streaming Job Task is getting cancelled silently and causing job to restart

2019-01-09 Thread sohimankotia
Hi Stefan, Attaching Logs : You can search for : "2019-01-09 19:34:44,170 INFO org.apache.flink.runtime.taskmanager.Task - Attempting to cancel task Source: " in first 2 log files. f3-part-aa.gz

Re: [DISCUSS] Dropping flink-storm?

2019-01-09 Thread Till Rohrmann
With https://issues.apache.org/jira/browse/FLINK-10571, we will remove the Storm topologies from Flink and keep the wrappers for the moment. However, looking at the FlinkTopologyContext [1], it becomes quite obvious that Flink's compatibility with Storm is really limited. Almost all of the

Re: Flink Streaming Job Task is getting cancelled silently and causing job to restart

2019-01-09 Thread Stefan Richter
Hi, Could you also provide the job master log? Best, Stefan > On 9. Jan 2019, at 12:02, sohimankotia wrote: > > Hi, > > I am running Flink Streaming Job with 1.5.5 version. > > - Job is basically reading from Kafka , windowing on 2 minutes , and writing > to hdfs using AvroBucketing Sink .

Re: windowAll and AggregateFunction

2019-01-09 Thread Stefan Richter
Hi, I think your expectation about windowAll is wrong, from the method documentation: “Note: This operation is inherently non-parallel since all elements have to pass through the same operator instance” and I also cannot think of a way in which the windowing API would support your use case

Re: Can checkpoints be used to migrate jobs between Flink versions ?

2019-01-09 Thread Stefan Richter
Hi, I would assume that this should currently work because the format of basic savepoints and checkpoints is the same right now. The restriction in the doc is probably there in case that the checkpoint format will diverge more in the future. Best, Stefan > On 9. Jan 2019, at 13:12, Edward

Re: Zookeeper shared by Flink and Kafka

2019-01-09 Thread Stefan Richter
Hi, That is more a ZK question than a Flink question, but I don’t think there is a problem. Best, Stefan > On 9. Jan 2019, at 13:31, min@ubs.com wrote: > > Hi, > > I am new to Flink. > > I have a question: > Can a zookeeper cluster be shared by a flink cluster and a kafka cluster? >

windowAll and AggregateFunction

2019-01-09 Thread CPC
Hi all, In our implementation,we are consuming from kafka and calculating distinct with hyperloglog. We are using windowAll function with a custom AggregateFunction but flink runtime shows a little bit unexpected behavior at runtime. Our sources running with parallelism 4 and i expect add

Zookeeper shared by Flink and Kafka

2019-01-09 Thread min.tan
Hi, I am new to Flink. I have a question: Can a zookeeper cluster be shared by a flink cluster and a kafka cluster? Regards, Min Check out our new brand campaign: www.ubs.com/together E-mails can involve SUBSTANTIAL RISKS, e.g. lack of confidentiality, potential manipulation of contents

Can checkpoints be used to migrate jobs between Flink versions ?

2019-01-09 Thread Edward Rojas
Hello, For upgrading jobs between Flink versions I follow the guide in the doc here: https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/upgrading.html#upgrading-the-flink-framework-version It states that we should always use savepoints for this procedure, I followed it and it works

Flink Streaming Job Task is getting cancelled silently and causing job to restart

2019-01-09 Thread sohimankotia
Hi, I am running Flink Streaming Job with 1.5.5 version. - Job is basically reading from Kafka , windowing on 2 minutes , and writing to hdfs using AvroBucketing Sink . - Job is running with parallelism 132 - Checkpointing is enabled with interval of 1 minute. - Savepoint is enabled and getting

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
Hi Till, This job is not on AthenaX but on a special uber version Flink. I tried to ping the connected host from connecting host. It seems very stable. For the connection timeout, I do set it as 20min but it still report the timeout after 2 minutes. Could you let me know how do you test locally

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Till Rohrmann
Hi Wenrui, I executed AutoParallelismITCase#testProgramWithAutoParallelism and set a breakpoint in NettClient.java:102 to see whether the configured timeout value is correctly set. Moreover, I did the same for AbstractNioChannel.java:207 and it looked as if the correct timeout value was set.