Re: method meaning of class Trigger

2018-07-16 Thread vino yang
Hi Soheil, Flink support many kinds of time window. The "onXXXTime" interface method would be triggered based on the time characteristic you chose, you should read the documentation that Chesnay provided and this documentation :

Re: AvroInputFormat NullPointerException issues

2018-07-16 Thread vino yang
Hi Porritt, Based on the exception stack trace you provided, it seems the exception occurs when initializing Avro schema. You did not give the definition of the MyAvroSchema Class, so I'd to suggest you : 1. make sure the file path "file:///home/myuser/test.avro" exists in your tm node which run

Re: Some question about document

2018-07-16 Thread Yuta Morisawa
Hi yang Thank you for your comment. I read document and got an idea. Flink uses custom serializers on known types, and the fallback is kryo. The other arbitary objects is serialized by kryo. On 2018/07/12 12:14, vino yang wrote: Hi Yuta, It seems Chesnay is right. The "fallback" in flink's

Re: Ever increasing key space

2018-07-16 Thread Yun Tang
Hi Chen >From your description, I think you called keyedState.clear() to clear up the >key which has not been seen for several minutes. * For HeapKeyedStateBackend, it will just remove the related content from memory immediately, no worry about the increasing checkpoint size. * For

Why is flink master bump version to 1.7?

2018-07-16 Thread 陈梓立
Hi, I see no 1.6 branch or tag. What's the reason we skip 1.6 and now 1.7-SNAPSHOT? or there is a 1.6 I miss. Best, tison

Re: FW: high availability with automated disaster recovery using zookeeper

2018-07-16 Thread Scott Kidder
Hi Tovi, we run all services (Flink, Zookeeper, Hadoop HDFS, and Consul) in a Kubernetes cluster in each data center. Kubernetes will automatically restart/reschedule any services that crash or become unhealthy. This is a little outside the scope of Flink, and I'd be happy to discuss it further

RE: Parallelism and keyed streams

2018-07-16 Thread Martin, Nick
Is value(index-1) stored in Keyed State, or just a local variable inside the operator? -Original Message- From: Nicholas Walton [mailto:nwal...@me.com] Sent: Monday, July 16, 2018 1:33 PM To: user@flink.apache.org Subject: Parallelism and keyed streams I have a stream of tuples ,

flink javax.xml.parser Error

2018-07-16 Thread antonio saldivar
Hello I am getting this error when I run my application in Ambari local-cluster and I get this error at runtime. Flink 1.4.2 phoenix hbase Does any one have any recommendation to solve this issue? javax.xml.parsers.FactoryConfigurationError: Provider for class

Parallelism and keyed streams

2018-07-16 Thread Nicholas Walton
I have a stream of tuples , which I form into a keyedStream using keyBy on channel. I then need to process each channel in parallel. Each parallel stream must be processed in strict sequential order by index to calculate the ratios value(index)/value(index-1). If I set parallelism to 1 all is

Ever increasing key space

2018-07-16 Thread burgesschen
Hi every one, We are building a flink job that keys on a dynamic value. Only a few events share the same key and events with new keys are consumed constantly. For each key, there are some keyedState created the first time it is seen. And we clean up the keyedState if the key has not been seen

Re: State sharing across trigger and evictor

2018-07-16 Thread vino yang
Hi Jayant, As Fabian said, Evictor can not access flink state. But if you really has your own requirement. You can try to customize a special trigger (with evictor function), may be that could match your requirement. Thanks. vino. 2018-07-16 17:17 GMT+08:00 Fabian Hueske : > Hi, > > I don't

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-16 Thread Ashwin Sinha
Thanks Timo for the clarification, but our processing actually involves aggregations on huge past data also, which won't be served by processing time. Is this a WIP feature? On Mon, Jul 16, 2018 at 7:29 PM Timo Walther wrote: > Hi Ashwin, > > the SQL Client is in an early development stage

Re: event time and late events - documentation

2018-07-16 Thread Elias Levy
Tovi, The document here should answer your question. If it doesn't, please let me know. On Mon, Jul 16, 2018 at 5:17 AM Sofer, Tovi wrote: > Hi group, > > Can someone please elaborate on the

Re: Elasticsearch 6.3.x connector

2018-07-16 Thread vino yang
Hi miki, Flink does not provide a connector for ElasticSearch 6 yet. There is this JIRA issue to track the development progress [1]. Based on the issues's status, it is in progress. Flink's documentation about the elasticsearch connector is right, current stable version is 5.x. And the maven

Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-16 Thread Piotr Nowojski
Hi Gerard, I second to what Zhijiang wrote. Please check GC pauses, either via GC logging, 3rd party tool like jconsole (or some memory profiler) or via enabling resource logging in Flink. After confirming that this is not the issue next time this happens, instead of cancelling the job,

Re: rowTime from json nested timestamp field in SQL-Client

2018-07-16 Thread Timo Walther
Hi Ashwin, the SQL Client is in an early development stage right now and has some limitations. Your problem is one of them. I files an issue for this: https://issues.apache.org/jira/browse/FLINK-9864 There is no easy solution to fix this problem. Maybe you can use processing-time for your

When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-16 Thread gerardg
Hi, Our deployment consists of a standalone HA cluster of 8 machines with an external Zookeeper cluster. We have observed several times that when a jobmanager fails and a new one is elected, the new one tries to restart more jobs than the ones that were running and since it can't find some

Re: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-16 Thread Hequn Cheng
Ah, cool. I was thinking register a timer at T and will be triggered at T+1ms. On Mon, Jul 16, 2018 at 7:26 PM, Aljoscha Krettek wrote: > On a side note: even if we change this off-by-one bug, I think there can > still be races because current processing time is queried using >

Re: method meaning of class Trigger

2018-07-16 Thread Chesnay Schepler
Did you read the documentation? https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#triggers On 16.07.2018 14:45, Soheil Pourbafrani wrote: Hi, In extending class Trigger we have methods like onElement, onProcessingTime and onEventTime. I know the method

Re: Multi-tenancy environment with mutual auth

2018-07-16 Thread ashish pok
Nico, This is great news. This is exactly what we are looking for. - Ashish On Monday, July 16, 2018, 8:28 AM, Nico Kruber wrote: Hi Ashish, this was just merged today for Flink 1.6. Please have a look at https://github.com/apache/flink/pull/6326 to check whether this fulfils your needs.

method meaning of class Trigger

2018-07-16 Thread Soheil Pourbafrani
Hi, In extending class Trigger we have methods like onElement, onProcessingTime and onEventTime. I know the method onElement will be called when a element is added to the window, but I have no idea about when two other methods onProcessingTime and onEventTime will be called?

Re: Multi-tenancy environment with mutual auth

2018-07-16 Thread Nico Kruber
Hi Ashish, this was just merged today for Flink 1.6. Please have a look at https://github.com/apache/flink/pull/6326 to check whether this fulfils your needs. Nico On 14/07/18 14:02, ashish pok wrote: > All, > > We are running into a blocking production deployment issue. It looks > like Flink

rowTime from json nested timestamp field in SQL-Client

2018-07-16 Thread Ashwin Sinha
Hi Users, In Flink1.5 SQL CLient , we are trying to define rowTime from a nested JSON element, but struggling with syntax. JSON data format: https://pastebin.com/ByCLhEnF YML table config:

AvroInputFormat NullPointerException issues

2018-07-16 Thread Porritt, James
I've been trying to use the following code: ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Path path = new Path("file:///home/myuser/test.avro"); AvroInputFormat my_format = new AvroInputFormat<>(path, MyAvroSchema.class); DataSet

event time and late events - documentation

2018-07-16 Thread Sofer, Tovi
Hi group, Can someone please elaborate on the comment at the end of section "Debugging Windows & Event Time"? Didn't understand it meaning. https://ci.apache.org/projects/flink/flink-docs-master/monitoring/debugging_event_time.html "Handling Event Time Stragglers Approach 1: Watermark stays late

Re: Persisting Table in Flink API

2018-07-16 Thread Hequn Cheng
Hi Shivam, I think the non-window stream-stream join can solve your problem. The non-window join will store all data from both inputs and output joined results. The semantics of non-window join is exactly the same with batch join. One important thing to note is that the state of join might grow

Re: Persisting Table in Flink API

2018-07-16 Thread vino yang
Hi Shivam, Thanks for providing more details about your use case. So I know you mean two DataStream non-window join. There are two ways to implement this : 1、user Flink's table/sql non-window join for Streaming : this way the messages stored in state by Flink, you may not care the state but you

FW: high availability with automated disaster recovery using zookeeper

2018-07-16 Thread Sofer, Tovi
Thank you Scott, Looks like a very elegant solution. How did you manage high availability in single data center? Thanks, Tovi From: Scott Kidder Sent: יום ו 13 יולי 2018 01:13 To: Sofer, Tovi [ICG-IT] Cc: user@flink.apache.org Subject: Re: high availability with automated disaster recovery

Re: Flink on Mesos: containers question

2018-07-16 Thread Fabian Hueske
Hi Alexei, Till (in CC) is familiar with Flink's Mesos support in 1.4.x. Best, Fabian 2018-07-13 15:07 GMT+02:00 NEKRASSOV, ALEXEI : > Can someone please clarify how Flink on Mesos in containerized? > > > > On 5-node Mesos cluster I started Flink (1.4.2) with two Task Managers. > Mesos shows

Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-16 Thread Fabian Hueske
Hi Gerard, Thanks for reporting this issue. I'm pulling in Nico and Piotr who have been working on the networking stack lately and might have some ideas regarding your issue. Best, Fabian 2018-07-13 13:00 GMT+02:00 Zhijiang(wangzhijiang999) < wangzhijiang...@aliyun.com>: > Hi Gerard, > > I

Re: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-16 Thread Aljoscha Krettek
On a side note: even if we change this off-by-one bug, I think there can still be races because current processing time is queried using System.currentTimeMillis() and we set timers using a ScheduledThreadPoolExecutor (currently). If there's any race between those two you can also get weird

Elasticsearch 6.3.x connector

2018-07-16 Thread miki haiat
HI , I just wondered what is to status of the 6.3.x elastic connector. flink-connector-elasticsearch-base_2.11 has elastic 6.3.1 dependencies . Documentation mention 5.3 as the stable version

Flink WindowedStream - Need assistance

2018-07-16 Thread Titus Rakkesh
We have 2 independent streams which will receive elements in different frequency, DataStream> splittedActivationTuple; DataStream> unionReloadsStream; We have a requirement to keep "splittedActivationTuple" stream elements in a Window of eviction time period of 24 hours. So I created a

Re: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-16 Thread Aljoscha Krettek
I think there is a bug in how processing-time timers work. For event-time, we fire timers when the watermark is >= the timestamp, this is correct because a watermark T says that we will not see elements with a timestamp smaller or equal to T. For processing time, a time of T does not say that

Re: Persisting Table in Flink API

2018-07-16 Thread Shivam Sharma
Hi Vino, First I want to tell you that we are working on Flink SQL so there is no chance to use Data Stream API. I will give one example of my use case here:- Let's say we have two Kafka Topics: 1. UserName to UserId Mapping => {"userName": "shivam", "userId": 123} 2. User transactions

Re: State sharing across trigger and evictor

2018-07-16 Thread Fabian Hueske
Hi, I don't think that is possible. The Evictor interface does not provide access to a state store, so there is no way to access state. Best, Fabian 2018-07-10 13:26 GMT+02:00 Jayant Ameta : > Hi, > I'm using the GlobalWindow with a custom CountTrigger (similar to the > CountTrigger provided

Re: Flink CLI properties with HA

2018-07-16 Thread vino yang
Hi Sampath, Flink CLI need to retrieve the JobManager leader address, so it need to access the HA specific configuration. Because if based on Zookeeper to implement the HA, the leader address information will fetch from Zookeeper. The main use of config item *high-availability.storageDir* is

Fwd: Flink CLI properties with HA

2018-07-16 Thread Sampath Bhat
-- Forwarded message -- From: Sampath Bhat Date: Fri, Jul 13, 2018 at 3:18 PM Subject: Flink CLI properties with HA To: user Hello When HA is enabled in the flink cluster and if I've to submit job via flink CLI then in the flink-conf.yaml of flink CLI should contain this

Re: Persisting Table in Flink API

2018-07-16 Thread vino yang
Hi Shivam, Can you provide more details about your use case? The join for batch or streaming? which join type (window or non-window or stream-dimension table join)? If it is stream-dimension table join and the table is huge, use Redis or some cache based on memory, can help to process your