Re: Streaming data from AWS S3 Source

2023-11-23 Thread Feng Jin
Hi Neelabh You can use FileSystem Connector . for DataStream [1] for TableAPI [2] . And you need to put necessary dependency to your flink environment. [3] For Flink SQL setup, you can reference `sql getting started module`[4] [1]. https://nightlies.apache.org/flink/flink-docs-master/docs/c

Re: Streaming queries in FTS using Kafka log

2022-12-21 Thread Alexander Sorokoumov
Hello everyone, Answering my own question, it turns out that Flink Table Store removes the normalization node on read from an external log system only if log.changelog-mode='all' and log.consistency = 'transactional' [1]. 1. https://github.com/apache/flink-table-store/blob/7e0d55ff3dc9fd48455b17d

Re: Streaming

2018-06-27 Thread zhangminglei
Forward shiming mail to Aitozi. Aitozi We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time. However, the time resolution of this operator is 1 millisecond, so it

Re: Streaming

2018-06-27 Thread zhangminglei
To aitozi. Cheers Minglei > 在 2018年6月27日,下午5:46,shimin yang 写道: > > Aitozi > > We are using hyperloglog to count daily uv, but it only provided an > approximate value. I also tried the count distinct in flink table without > window, but need to set the retention time. > > However, the time

Re: Streaming job gets slower and slower

2017-09-04 Thread Till Rohrmann
Hi Aparup, the slow-down can have multiple reasons. One reason could be that your computation in Timeseries-Analytics becomes more complex over time and therefore it slows down resulting in back pressure at the sources. This could be, for example, caused by accumulating a large state. Here the que

Re: Streaming Graph processing

2017-07-04 Thread Paris Carbone
I cannot answer that for sure since graph streams are still a research topic. It depends on the demand and how fast graph stream representations and operations will become adopted. If there is high demand on Flink we can definitely start a FLIP at some point but for now it makes sense to see how

Re: Streaming Graph processing

2017-06-30 Thread Ameet BD
Hi Paris, Thanks for the reply. Any idea when will be Gelly-Stream become part of official Flink distribution? Regards, Ameet On Fri, Jun 30, 2017 at 8:20 PM, Paris Carbone wrote: > Hi Ameet, > > Flink’s Gelly currently operates on the DataSet model. > However, we have an experimental project w

Re: Streaming Graph processing

2017-06-30 Thread Paris Carbone
Hi Ameet, Flink’s Gelly currently operates on the DataSet model. However, we have an experimental project with Vasia (Gelly-Stream) that does exactly that. You can check it out and let us know directly what you think: https://github.com/vasia/gelly-streaming Paris On 30 Jun 2017, at 13:17, Ame

Re: Streaming - memory management

2016-08-31 Thread Stephan Ewen
If you use RocksDB, you will not run into OutOfMemory errors. On Wed, Aug 31, 2016 at 6:34 PM, Fabian Hueske wrote: > Hi Vinaj, > > if you use user-defined state, you have to manually clear it. > Otherwise, it will stay in the state backend (heap or RocksDB) until the > job goes down (planned or

Re: Streaming - memory management

2016-08-31 Thread Fabian Hueske
Hi Vinaj, if you use user-defined state, you have to manually clear it. Otherwise, it will stay in the state backend (heap or RocksDB) until the job goes down (planned or due to an OOM error). This is esp. important to keep in mind, when using keyed state. If you have an unbounded, evolving key s

Re: Streaming - memory management

2016-08-31 Thread Vinay Patil
Hi Stephan, Just wanted to jump into this discussion regarding state. So do you mean that if we maintain user-defined state (for non-window operators), then if we do not clear it explicitly will the data for that key remains in RocksDB. What happens in case of checkpoint ? I read in the documen

Re: Streaming - memory management

2016-08-31 Thread Stephan Ewen
In streaming, memory is mainly needed for state (key/value state). The exact representation depends on the chosen StateBackend. State is explicitly released: For windows, state is cleaned up automatically (firing / expiry), for user-defined state, keys have to be explicitly cleared (clear() method

Re: Streaming KV store abstraction

2016-03-24 Thread Nam-Luc Tran
>Sorry for the late answer, I completely missed this email. (Thanks Robert for pointing out). No problem ;) >Now that you have everything set up, in flatMap1 (for events) you would query the state : state.value() and enrich your data >in flatMap2 you would update the state: state.update(newState)

Re: Streaming KV store abstraction

2016-03-23 Thread Gyula Fóra
Hi! Sorry for the late answer, I completely missed this email. (Thanks Robert for pointing out). You won't be able to use that project as it was dependent on an earlier snapshot version that still had completely different state semantics. I don't think it is realistic that I will re-implment this

Re: Streaming KV store abstraction

2016-03-19 Thread Nam-Luc Tran
Hi Gyula, I'm currently looking after ways to enrich streams with external data. Have you got any update on the topic in general or on StreamKV? I've checked out the code but it won't build, mainly because StateCheckpointer has been removed since [FLINK-2808]. Any hint on a quick replacement, bef

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-14 Thread Márton Balassi
Hey, I think we came to the agreement that this PR is not mergeable right now, so I am closing it. I personally find it inconsistent to not have the fully API mirrored in Scala though, but this is something that we can revisit when prepairing 2.0. Best, Marton On Mon, Mar 14, 2016 at 8:14 PM, S

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-14 Thread Stefano Baghino
+1 for considering this as a bug. However, I do realize that API compatibility is extremely important when you commit to it, even if present behavior is not ideal. Silly idea (which is only applicable to Scala APIs, unfortunately): I'm currently working on a set of API extensions to solve FLINK-1

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-14 Thread Till Rohrmann
I agree with Aljoscha on this one, because `DataStreamSink` only contains setters which are compatible with the Scala API. On Mon, Mar 14, 2016 at 11:02 AM, Aljoscha Krettek wrote: > By the way, I don’t think it’s a bug that addSink() returns the Java > DataStreamSink. Having a Scala specific ve

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-14 Thread Aljoscha Krettek
By the way, I don’t think it’s a bug that addSink() returns the Java DataStreamSink. Having a Scala specific version of a DataStreamSink would not add functionality in this place, just code bloat. > On 14 Mar 2016, at 10:05, Fabian Hueske wrote: > > Yes, we will have more of these issues in the

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-14 Thread Fabian Hueske
Yes, we will have more of these issues in the future and each issue will need a separate discussion. I don't think that clearly unintended errors (I hope we won't find any intended errors) are a sufficient reason to break stable a stable API. IMO, the question that needs to be answered how much of

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Gyula Fóra
Hi, I think this is an important question that will surely come up in some cases in the future. I see your point Robert, that we have promised api compatibility for 1.x.y releases, but I am not sure that this should cover things that are clearly just unintended errors in the api from our side. I

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Chesnay Schepler
On 13.03.2016 12:14, Robert Metzger wrote: I think its too early to fork off a 2.0 branch. I have absolutely no idea when a 2.0 release becomes relevant, could be easily a year from now. at first i was going to agree with Robert, but then...I mean the issue with not allowing breaking changes is

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Márton Balassi
This approach means that we will have API breaking changes lingering around in random people's remote repos that will be untrackable in no time and people will not be able to build on each others changes. Whoever will have the pleasure to eventually merge those together will be on the receiving end

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Robert Metzger
I think its too early to fork off a 2.0 branch. I have absolutely no idea when a 2.0 release becomes relevant, could be easily a year from now. The API stability guarantees don't forbid adding new methods. Maybe we can find a good way to resolve the issue without changing the signature of existing

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Márton Balassi
Ok, if that is what we promised let's stick to that. Then would you suggest to open a release-2.0 branch and merge it there? On Sun, Mar 13, 2016 at 11:43 AM, Robert Metzger wrote: > Hey, > JIRA was down for quite a while yesterday. Sadly, I don't think we can > merge the change because its API

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-13 Thread Robert Metzger
Hey, JIRA was down for quite a while yesterday. Sadly, I don't think we can merge the change because its API breaking. One of the promises of the 1.0 release is that we are not breaking any APIs in the 1.x.y series of Flink. We can fix those issues with a 2.x release. On Sun, Mar 13, 2016 at 5:27

Re: [streaming, scala] Scala DataStream#addSink returns Java DataStreamSink

2016-03-12 Thread Márton Balassi
The JIRA issue is FLINK-3610. On Sat, Mar 12, 2016 at 8:39 PM, Márton Balassi wrote: > > I have just come across a shortcoming of the streaming Scala API: it > completely lacks the Scala implementation of the DataStreamSink and > instead the Java version is used. [1] > > I would regard this as a

Re: Streaming Iterations, no headOperator ?

2016-01-02 Thread Aljoscha Krettek
Hi, the iteration operators (head and tail) don't have a StreamOperator, they are pure tasks. On Sat, Jan 2, 2016, 21:08 Matthias J. Sax wrote: > Hi, > > I am working on FLINK-1870 and my changes break some unit tests. The > problem is in streaming.api.IterateTest. > > I tracked the problem down

Re: Streaming statefull operator with hashmap

2015-11-18 Thread Stephan Ewen
For initializing the Map manually, I meant making "null" the default value and writing the code like HashMap map = state.value() if (map == null) { map = new HashMap<>(); } rather than expecting the state to always clone you a new empty map On Thu, Nov 12, 2015 at 11:29 AM, Aljoscha Krettek w

Re: Streaming statefull operator with hashmap

2015-11-12 Thread Aljoscha Krettek
Hi, you can do it using the register* methods on StreamExecutionEnvironment. So, for example: // set up the execution environment final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.registerType(InputType.class); env.registerType(MicroModel.class); I

Re: Streaming statefull operator with hashmap

2015-11-11 Thread Martin Neumann
Thanks for the help. TypeExtractor.getForObject(modelMapInit) did the job. Its possible that its an IDE problem that .getClass() did not work. Intellij is a bit fiddly with those things. 1) Making null the default value and initializing manually is probably more > efficient, because otherwise the

Re: Streaming statefull operator with hashmap

2015-11-11 Thread Stephan Ewen
It should suffice to do something like "getRuntimeContext().getKeyValueState("microModelMap", new HashMap().getClass(), null);" Two more comments: 1) Making null the default value and initializing manually is probably more efficient, because otherwise the empty map would have to be cloned each t

Re: Streaming statefull operator with hashmap

2015-11-11 Thread Gyula Fóra
Hey, Yes what you wrote should work. You can alternatively use TypeExtractor.getForObject(modelMapInit) to extract the tye information. I also like to implement my custom type info for Hashmaps and the other types and use that. Cheers, Gyula Martin Neumann ezt írta (időpont: 2015. nov. 11., Sz

Re: streaming GroupBy + Fold

2015-10-14 Thread Márton Balassi
Thanks for the update. On Wed, Oct 14, 2015 at 10:12 AM, Martin Neumann wrote: > Hej, > > I checked the last Flink trunk version together with Aljoscha and the > problems are gone by now. (Just to close this discussion thread now) > > cheers Martin > > On Wed, Oct 7, 2015 at 1:21 PM, Aljoscha Kr

Re: streaming GroupBy + Fold

2015-10-14 Thread Martin Neumann
Hej, I checked the last Flink trunk version together with Aljoscha and the problems are gone by now. (Just to close this discussion thread now) cheers Martin On Wed, Oct 7, 2015 at 1:21 PM, Aljoscha Krettek wrote: > Hi, > I ran it using the attached TimeShift.java and I didn't get any key > cr

Re: streaming GroupBy + Fold

2015-10-07 Thread Aljoscha Krettek
Hi, I ran it using the attached TimeShift.java and I didn't get any key cross-talk. Could you please try my example, or verify that the problem still persists on your side? I replaced the source by a source that just creates random strings. On Tue, 6 Oct 2015 at 09:56 Martin Neumann wrote: >

Re: streaming GroupBy + Fold

2015-10-06 Thread Martin Neumann
The window is actually part of the workaround we currently using (should have commented it out) where we use a window and a MapFunction instead of a Fold. Original I was running fold without a window facing the same problems. The workaround works for now so there is no urgency on that one. I just

Re: streaming GroupBy + Fold

2015-10-05 Thread Aljoscha Krettek
Hi, If you are using a fold you are using none of the new code paths. I will add support for Fold to the new windowing implementation today, though. Cheers, Aljoscha On Mon, 5 Oct 2015 at 23:49 Márton Balassi wrote: > Martin, I have looked at your code and you are running a fold in a window, >

Re: streaming GroupBy + Fold

2015-10-05 Thread Márton Balassi
Martin, I have looked at your code and you are running a fold in a window, that is a very important distinction - the code paths are separate. Those code paths have been recently touched by Aljoscha if I am not mistaken. I have mocked up a simple example and could not reproduce your problem unfort

Re: streaming GroupBy + Fold

2015-10-05 Thread Márton Balassi
Thanks, I am checking it out tomorrow morning. On Mon, Oct 5, 2015 at 9:59 PM, Martin Neumann wrote: > Hej, > > Sorry it took so long to respond I needed to check if I was actually > allowed to share the code since it uses internal datasets. > > In the appendix of this email you will find the ma

Re: streaming GroupBy + Fold

2015-10-05 Thread Martin Neumann
Hej, Sorry it took so long to respond I needed to check if I was actually allowed to share the code since it uses internal datasets. In the appendix of this email you will find the main class of this job without the supporting classes or the actual dataset. If you want to run it you need to repla

Re: streaming GroupBy + Fold

2015-10-03 Thread Márton Balassi
Hey, Thanks for reporting the problem, Martin. I have not merged the PR Stephan is referring to yet. [1] There I am cleaning up some of the internals too. Just out of curiosity, could you share the code for the failing test please? [1] https://github.com/apache/flink/pull/1155 On Fri, Oct 2, 201

Re: streaming GroupBy + Fold

2015-10-02 Thread Martin Neumann
One of my colleagues found it today when we where hunting bugs today. We where using the latest 0.10 version pulled from maven this morning. The program we where testing is new code so I cant tell you if the behavior has changed or if it was always like this. On Fri, Oct 2, 2015 at 7:46 PM, Stepha

Re: streaming GroupBy + Fold

2015-10-02 Thread Stephan Ewen
I think these operations were recently moved to the internal state interface. Did the behavior change then? @Marton or Gyula, can you comment? Is it per chance not mapped to the partitioned state? On Fri, Oct 2, 2015 at 6:37 PM, Martin Neumann wrote: > Hej, > > In one of my Programs I run a Fol

Re: Streaming KV store abstraction

2015-09-15 Thread Stephan Ewen
I think that is actually a cool way to kick of an addition to the system. Gives you a lot of flexibility and releasing and testing... It helps, though, to upload maven artifacts for it! On Tue, Sep 15, 2015 at 7:18 PM, Gyula Fóra wrote: > Hey All, > > We decided to make this a standalone librar

Re: Streaming KV store abstraction

2015-09-15 Thread Gyula Fóra
Hey All, We decided to make this a standalone library until it is stable enough and then we can decide whether we want to keep it like that or include in the project: https://github.com/gyfora/StreamKV Cheers, Gyula Gianmarco De Francisci Morales ezt írta (időpont: 2015. szept. 9., Sze, 20:25)

Re: Streaming KV store abstraction

2015-09-09 Thread Gianmarco De Francisci Morales
Yes, pretty clear. I guess semantically it's still a co-group, but implemented slightly differently. Thanks! -- Gianmarco On 9 September 2015 at 15:37, Gyula Fóra wrote: > Hey Gianmarco, > > So the implementation looks something different: > > The update stream is received by a stateful KVStor

Re: Streaming KV store abstraction

2015-09-09 Thread Gyula Fóra
Hey Gianmarco, So the implementation looks something different: The update stream is received by a stateful KVStoreOperator which stores the K-V pairs as their partitioned state. The query for the 2 cities is assigned an ID yes, and is split to the 2 cities, and each of these are sent to the sa

Re: Streaming KV store abstraction

2015-09-09 Thread Gianmarco De Francisci Morales
Just a silly question. For the example you described, in a data flow model, you would do something like this: Have query ids added to the city pairs (qid, city1, city2), then split the query stream on the two cities and co-group it with the updates stream ((city1, qid) , (city, temp)), same for ci

Re: Streaming KV store abstraction

2015-09-08 Thread Aljoscha Krettek
That's a very nice application of the Stream API and partitioned state. :D I think we should run some tests on a cluster based on this to see what kind of throughput the partitioned state system can handle and also how it behaves with larger numbers of keys. The KVStore is just an interface and t

Re: Streaming KV store abstraction

2015-09-08 Thread Gyula Fóra
@Stephan: Technically speaking this is really just a partitioned key-value state and a fancy operator executing special operations on this state. >From the user's perspective though this is something hard to implement. If you want to share state between two stream for instance this way (getting u

Re: Streaming KV store abstraction

2015-09-08 Thread Stephan Ewen
@Gyula Can you explain a bit what this KeyValue store would do more then the partitioned key/value state? On Tue, Sep 8, 2015 at 2:49 PM, Gábor Gévay wrote: > Hello, > > As for use cases, in my old job at Ericsson we were building a > streaming system that was processing data from telephone net

Re: Streaming KV store abstraction

2015-09-08 Thread Gábor Gévay
Hello, As for use cases, in my old job at Ericsson we were building a streaming system that was processing data from telephone networks, and it was using key-value stores a LOT. For example, keeping track of various state info of the users (which cell are they currently connected to, what bearers

Re: streaming iteration

2015-07-27 Thread Gyula Fóra
Hey, The JobGraph that is executed cannot have cycles (due to scheduling reasons), that's why there is no edge between the head and tail operators. What we do instead is we add an extra source to the head and a sink to the tail (with the same parallelism), and the feedback data is passed outside o

Re: Streaming topology tests to use CollectionEnvironment

2015-03-16 Thread Stephan Ewen
The CollectionsEnvironment is a different execution backend for the Batch API - based entirely on Java Collections. The use case for this backend is to embed Flink programs into other programs in an extremely lightweight fashion, without threading overheads, serialization, anything. The tests are

Re: [streaming] Regarding loops in the job graph

2015-01-21 Thread Márton Balassi
As far as I remember when we were implementing the iteration for streaming we deliberately "went off" the jobgraph and implemented the backward edge as an in-memory hook (practically the same in batch but not blocking in the backward channel). When we asked the reasoning here on the dev list we go