Re: about the checkpoint and state backend

2018-01-05 Thread Jinhua Luo
points are not merged together into one directory but > the handles to those directories are sent to the CheckpointCoordinator which > creates a checkpoint that stores handles to all the states stored in DFS. > > Best, > Aljoscha > >> On 4. Jan 2018, at 15:06, Jinhua Luo <

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
; stored in one column. > >> On 4. Jan 2018, at 14:56, Jinhua Luo <luajit...@gmail.com> wrote: >> >> ok, I see. >> >> But as known, one rocksdb instance occupy one directory, so I am still >> wondering what's the relationship between the states and roc

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
point. When restoring, the > CheckpointCoordinator figures out which handles need to be sent to which > operators for restoring. > > Best, > Aljoscha > >> On 4. Jan 2018, at 14:44, Jinhua Luo <luajit...@gmail.com> wrote: >> >> OK, I think I get the point. >>

Re: does the flink sink only support bio?

2018-01-04 Thread Jinhua Luo
fi/2013/04/11/how-to-write-a-java-transaction-manager-that-works-with-postgresql/ > . > > Does this help or do you really need async read-modify-update? > > Best, > Stefan > > > Am 03.01.2018 um 15:08 schrieb Jinhua Luo <luajit...@gmail.com>: > > No,

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
you specified in the constructor. > > Best, > Aljoscha > > >> On 4. Jan 2018, at 14:23, Jinhua Luo <luajit...@gmail.com> wrote: >> >> I still do not understand the relationship between rocksdb backend and >> the filesystem (here I refer to any filesystem i

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
uture" would be failed afterwards immediately. 2018-01-04 21:31 GMT+08:00 Jinhua Luo <luajit...@gmail.com>: > 2018-01-04 21:04 GMT+08:00 Aljoscha Krettek <aljos...@apache.org>: >> Memory usage should grow linearly with the number of windows you have active >> at any

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
2018-01-04 21:04 GMT+08:00 Aljoscha Krettek : > Memory usage should grow linearly with the number of windows you have active > at any given time, the number of keys and the number of different window > operations you have. But the memory usage is still too much, especially

Re: about the checkpoint and state backend

2018-01-04 Thread Jinhua Luo
ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/state_backends.html > > > > Am 1/1/18 um 3:50 PM schrieb Jinhua Luo: > >> Hi All, >> >> I have two questions: >> >> a) does the records/elements themselves would be checkpointed? or just >&g

Re: keyby() issue

2018-01-04 Thread Jinhua Luo
nt-time? How do you > aggregate elements of the windows? > > Regards, > Timo > > > > Am 1/1/18 um 6:00 AM schrieb Jinhua Luo: > >> I checked the logs, but no information indicates what happens. >> >> In fact, in the same app, there is another stream, but i

Re: does the flink sink only support bio?

2018-01-03 Thread Jinhua Luo
orts ListState, right? > > > You can use non-keyed state, aka operator state, to store such information. > See here: > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#using-managed-operator-state > . It does not require a KeyedSteam. > > Best,

Re: does the flink sink only support bio?

2018-01-03 Thread Jinhua Luo
operators, e.g. fold, so it normally faces the DataStream but not KeyedStream, and DataStream only supports ListState, right? 2018-01-03 18:43 GMT+08:00 Stefan Richter <s.rich...@data-artisans.com>: > > >> Am 01.01.2018 um 15:22 schrieb Jinhua Luo <luajit...@gmail.com>: >>

Re: does the flink sink only support bio?

2018-01-01 Thread Jinhua Luo
2017-12-08 18:25 GMT+08:00 Stefan Richter : > You need to be a bit careful if your sink needs exactly-once semantics. In > this case things should either be idempotent or the db must support rolling > back changes between checkpoints, e.g. via transactions. Commits

Re: keyby() issue

2017-12-31 Thread Jinhua Luo
ot, maybe worth a > look. > > have you tried to do a thread dump? > How is the GC pause? > do you see flink restart? check the exception tab in Flink web UI for your > job. > > > > On Sun, Dec 31, 2017 at 6:20 AM, Jinhua Luo <luajit...@gmail.com> wrote: >&g

Re: keyby() issue

2017-12-31 Thread Jinhua Luo
ly frustrated that flink could be fulfill its feature just like the doc said. Thank you all. 2017-12-29 17:42 GMT+08:00 Jinhua Luo <luajit...@gmail.com>: > I misuse the key selector. I checked the doc and found it must return > deterministic key, so using random is wrong, but I still cou

Re: keyby() issue

2017-12-29 Thread Jinhua Luo
I misuse the key selector. I checked the doc and found it must return deterministic key, so using random is wrong, but I still could not understand why it would cause oom. 2017-12-28 21:57 GMT+08:00 Jinhua Luo <luajit...@gmail.com>: > It's very strange, when I change the key select

Re: keyby() issue

2017-12-28 Thread Jinhua Luo
va:809) at org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:51) Could anybody explain the internal of keyby()? 2017-12-28 17:33 GMT+08:00 Ufuk Celebi <u...@apache.org>: > Hey Jinhua, > > On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo <luajit...@gmail.com> wrote: &g

Re: keyby() issue

2017-12-28 Thread Jinhua Luo
within 137, correct? 2017-12-28 17:33 GMT+08:00 Ufuk Celebi <u...@apache.org>: > Hey Jinhua, > > On Thu, Dec 28, 2017 at 9:57 AM, Jinhua Luo <luajit...@gmail.com> wrote: >> The keyby() upon the field would generate unique key as the field >> value, so if the numb

keyby() issue

2017-12-28 Thread Jinhua Luo
Hi All, I need to aggregate some field of the event, at first I use keyby(), but I found the flink performs very slow (even stop working out results) due to the number of keys is around half a million per min. So I use windowAll() instead, and flink works as expected then. The keyby() upon the

Add new slave to running cluster?

2017-12-19 Thread Jinhua Luo
Hi All, If I add new slave (start taskmanager on new host) which does not included in the conf/slaves, I see below logs conintuously printed: ...Trying to register at JobManager...(attempt 147, timeout: 3 milliseconds) Is it normal? And does the new slave successfully added in the cluster?

How flink assigns task slots to the streams of the same app?

2017-12-18 Thread Jinhua Luo
Hi All, I start an app which contains multiple streams, and I found some stream of them get processed very slow, which uses keyBy() and the number of key would be up to million, and I also found all streams share only one task slot. Is it possible to assign more slots to the app?

Re: how flink extracts timestamp from transformed elements?

2017-12-18 Thread Jinhua Luo
ally evaluated and all results of > windows for the same time have the same timestamp. > > 2017-12-18 11:30 GMT+01:00 Jinhua Luo <luajit...@gmail.com>: >> >> Thanks. >> >> The keyBy() splits the stream into multiple logical streams, if I do >> timeWindow

Re: how flink extracts timestamp from transformed elements?

2017-12-18 Thread Jinhua Luo
not late) are > still aligned. > If records are aggregated in a time window, the aggregation results has the > maximum allowed timestamp of the window. For example a tumbling window of > size 1 hour that starts at 14:00 emits its results with a timestamp of > 14:59:59.999. > > Best, F

how flink extracts timestamp from transformed elements?

2017-12-16 Thread Jinhua Luo
Hi All, The timestamp assigner is for one type, normally for the type from the source, but after several operators, the element type would change and the elements would be aggregated, if I do timeWindow again, how flink extracts timestamp from elements? For example, the fold operators aggregate

netty conflict using lettuce redis client

2017-12-13 Thread Jinhua Luo
Hi All, The io.netty package included in flnk 1.3.2 is 4.0.23, while the latest lettuce-core (4.4) depends on netty 4.0.35. If I include netty 4.0.35 in the app jar, it would throw java.nio.channels.UnresolvedAddressException. It seems the netty classes are mixed between versions from app jar

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
Jinhua Luo <luajit...@gmail.com>: > If the window contains only one element, no more elements come in, > then by default (with EventTimeTrigger), the window would be fired by > next element if that element advances watermark which passes the end > of the window, correc

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
t is mean by "a window is created when the first > element arrives". > Otherwise, you'd have to fire empty windows for all possible keys (in case > of a window operator on a keyed stream) which is obviously not possible. > > 2017-12-12 9:30 GMT+01:00 Jinhua Luo <luajit...@gmail.

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
owed > lateness). If a late event arrives, you can update the result and emit an > update. In this case your downstream operators systems have to be able to > deal with updates. > 3) send the late events to a different channel via side outputs and handle > them later. > > > &g

Re: what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
.3/dev/windows.html#getting-late-data-as-a-side-output > > 2017-12-12 10:16 GMT+01:00 Jinhua Luo <luajit...@gmail.com>: >> >> Hi All, >> >> The watermark is monotonous incremental in a stream, correct? >> >> Given a stream out-of-order extremely, e.g

what's the best practice to determine event-time watermark for unordered stream?

2017-12-12 Thread Jinhua Luo
Hi All, The watermark is monotonous incremental in a stream, correct? Given a stream out-of-order extremely, e.g. e4(12:04:33) --> e3 (15:00:22) --> e2(12:04:21) --> e1 (12:03:01) Here e1 appears first, so watermark start from 12:03:01, so e3 is an early event, it would be placed in another

Re: when does the timed window ends?

2017-12-12 Thread Jinhua Luo
art time such as the built-in session window. > You can also define custom windows like that. > > Best, Fabian > > 2017-12-12 7:57 GMT+01:00 Jinhua Luo <luajit...@gmail.com>: >> >> Hi All, >> >> The document said "a window is created as soon as th

could I chain two timed window?

2017-12-11 Thread Jinhua Luo
Hi All, Given one stream source which generates 20k events/sec, and I need to aggregate the element count using sliding window of 1 hour size. The problem is, the window may buffer too many elements (which may cause a lot of block I/O because of checkpointing?), and in fact it does not necessary

when does the timed window ends?

2017-12-11 Thread Jinhua Luo
Hi All, The document said "a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness (see Allowed Lateness).". I am

Re: does the flink sink only support bio?

2017-12-08 Thread Jinhua Luo
> > Best, > Stefan > >> Am 08.12.2017 um 08:11 schrieb Jinhua Luo <luajit...@gmail.com>: >> >> Hi, all. >> >> The invoke method of sink seems no way to make async io? e.g. returns Future? >> >> For example, the redis connector uses jedis lib

does the flink sink only support bio?

2017-12-07 Thread Jinhua Luo
Hi, all. The invoke method of sink seems no way to make async io? e.g. returns Future? For example, the redis connector uses jedis lib to execute redis command synchronously: