Re: [Exception]Key expressions are only supported on POJO types and Tuples

2015-02-11 Thread Aljoscha Krettek
From the Flink documentation: Conditions for a class to be treated as a POJO by Flink: - The class must be public - It must have a public constructor without arguments - All fields either have to be public or there must be getters and setters for all non-public fields. In your example, the

Re: Flink Java 8 problem (no lambda, simple code)

2015-04-24 Thread Aljoscha Krettek
issues - but those were related to Java 8 lambdas. Back then, bumping ASM to version 5 helped it. Not sure if this is the same problem, though, since you do not seem to use Java 8 lambdas... On Fri, Apr 24, 2015 at 11:32 AM, Aljoscha Krettek aljos...@apache.org wrote: I'm looking

Re: Apache Flink transactions

2015-06-05 Thread Aljoscha Krettek
Hi, I think the example could be made more concise by using the Table API. http://ci.apache.org/projects/flink/flink-docs-master/libs/table.html Please let us know if you have questions about that, it is still quite new. On Fri, Jun 5, 2015 at 9:03 AM, hawin hawin.ji...@gmail.com wrote: Hi

Re: Apache Flink transactions

2015-06-05 Thread Aljoscha Krettek
Yes, this code seems very reasonable. :D The way to use this to modify a file on HDFS is to read the file, then filter out some elements and write a new modified file that does not contain the filtered out elements. As said before, Flink (or HDFS), does not allow in-place modification of files.

Re: Apache Flink transactions

2015-06-09 Thread Aljoscha Krettek
:03 AM, Aljoscha Krettek aljos...@apache.org wrote: Hi, actually, what do you want to know about Flink SQL? Aljoscha On Sat, Jun 6, 2015 at 2:22 AM, Hawin Jiang hawin.ji...@gmail.com wrote: Thanks all Actually, I want to know more info about Flink SQL and Flink performance Here

Re: Apache Flink transactions

2015-06-08 Thread Aljoscha Krettek
- part1 - 1 - 2 - ... - part2 - 1 - ... - partX Flink's file format supports recursive directory scans such that you can add new subfolders to dataSetRootFolder and read the full data set. 2015-06-05 9:58 GMT+02:00 Aljoscha Krettek aljos...@apache.org: Hi, I

Re: Using collect and accessing accumulator results

2015-06-18 Thread Aljoscha Krettek
@Ufuk, probably should. yes. On Thu, 18 Jun 2015 at 16:18 Tamara Mendt tammyme...@gmail.com wrote: Great, thanks! On Thu, Jun 18, 2015 at 4:16 PM, Ufuk Celebi u...@apache.org wrote: Should we add this to the Javadoc of the eagerly executed operations? On 18 Jun 2015, at 16:11, Maximilian

Re: Kafka0.8.2.1 + Flink0.9.0 issue

2015-06-26 Thread Aljoscha Krettek
Hi, could you please try replacing JavaDefaultStringSchema() with SimpleStringSchema() in your first example. The one where you get this exception: org.apache.commons.lang3.SerializationException: java.io.StreamCorruptedException: invalid stream header: 68617769 Cheers, Aljoscha On Fri, 26 Jun

Re: Memory in local setting

2015-06-26 Thread Aljoscha Krettek
Not yet, no. I created a Jira issue: https://issues.apache.org/jira/browse/FLINK-2277 On Thu, 25 Jun 2015 at 14:48 Sebastian s...@apache.org wrote: Is there a way to configure this setting for a delta iteration in the scala API? Best, Sebastian On 17.06.2015 10:04, Ufuk Celebi wrote:

Re: Help with Flink experimental Table API

2015-06-14 Thread Aljoscha Krettek
, Shiti On Sun, Jun 14, 2015 at 4:10 PM, Shiti Saxena ssaxena@gmail.com wrote: I'll do the fix On Sun, Jun 14, 2015 at 12:42 AM, Aljoscha Krettek aljos...@apache.org wrote: I merged your PR for the RowSerializer. Teaching the aggregators to deal with null values should be a very

Re: Help with Flink experimental Table API

2015-06-14 Thread Aljoscha Krettek
approach for the test case? Thanks, Shiti On Sun, Jun 14, 2015 at 4:10 PM, Shiti Saxena ssaxena@gmail.com wrote: I'll do the fix On Sun, Jun 14, 2015 at 12:42 AM, Aljoscha Krettek aljos...@apache.org wrote: I merged your PR for the RowSerializer. Teaching the aggregators to deal

Re: Help with Flink experimental Table API

2015-06-13 Thread Aljoscha Krettek
fix it. On Thu, 11 Jun 2015 at 10:40 Aljoscha Krettek aljos...@apache.org wrote: Cool, good to hear. The PojoSerializer already handles null fields. The RowSerializer can be modified in pretty much the same way. So you should start by looking at the copy()/serialize()/deserialize() methods

Re: Help with Flink experimental Table API

2015-06-11 Thread Aljoscha Krettek
Hi, yes, I think the problem is that the RowSerializer does not support null-values. I think we can add support for this, I will open a Jira issue. Another problem I then see is that the aggregations can not properly deal with null-values. This would need separate support. Regards, Aljoscha On

Re: Help with Flink experimental Table API

2015-06-15 Thread Aljoscha Krettek
on the issue with TupleSerializer or is someone working on it? On Mon, Jun 15, 2015 at 11:20 AM, Aljoscha Krettek aljos...@apache.org wrote: Hi, the reason why this doesn't work is that the TupleSerializer cannot deal with null values: @Test def testAggregationWithNull(): Unit = { val env

Re: Help with Flink experimental Table API

2015-06-16 Thread Aljoscha Krettek
)? On Tue, Jun 16, 2015 at 1:32 PM, Aljoscha Krettek aljos...@apache.org wrote: One more thing, it would be good if the TupleSerializer didn't write a boolean for every field. A single integer could be used where one bit specifies if a given field is null or not. (Maybe we should also add

Re: Building master branch is failed

2015-05-29 Thread Aljoscha Krettek
Thanks for fixing my oversight. :D On Fri, May 29, 2015 at 3:05 PM, Márton Balassi balassi.mar...@gmail.com wrote: Thanks, Max. On Fri, May 29, 2015 at 3:04 PM, Maximilian Michels m...@apache.org wrote: Fixed it on the master. Problem were some classes belonging to package

Re: fault tolerance model for stateful streams

2015-07-06 Thread Aljoscha Krettek
Hi, good questions, about 1. you are right, when the JobManager fails the state is lost. Ufuk, Till and Stephan are currently working on making the JobManager fault tolerant by having hot-standby JobManagers and storing the important JobManager state in ZooKeeper. Maybe they can further comment on

Re: Forward Partitioning same Parallelism: 1:1 communication?

2015-08-13 Thread Aljoscha Krettek
Hi, I looked into it. Right now, when the specified partitioner is FORWARD the JobGraph that is generated from the StreamGraph will have the POINT-TO-POINT pattern specified. This doesn't work, however, if the parallelism differs so the operators will not have a POINT-TO-POINT connection in the

Re: Java 8 and type erasure

2015-08-18 Thread Aljoscha Krettek
Hi Kristoffer, I'm afraid not, but maybe Timo has some further information. In this extended example we can see the problem: https://gist.github.com/aljoscha/84cc363d13cf1dfe9364. The output is: Type is: class org.apache.flink.examples.java8.wordcount.TypeTest$Thing class

Re: Pointers about internal threads and communication in Flink (streaming)

2015-08-17 Thread Aljoscha Krettek
Hi Vincenzo, regarding TaskManagers and how they execute the operations: The TaskManager gets a class that is derived from AbstractInvokable. The TaskManager will create an object from that class and then call methods to facilitate execution. The two main methods are registerInputOutput() and

Re: Statefull computation

2015-08-23 Thread Aljoscha Krettek
Hi, I wanted to post something along the same lines but now I don't think the approach with local top-ks and merging works. For example, if you want to get top-4 and you do the pre-processing in two parallel instances. This input data would lead to incorrect results: 1. Instance: a 6 b 5 c 4 d 3

Re: Statefull computation

2015-08-24 Thread Aljoscha Krettek
think. Best, Aljoscha On Sun, 23 Aug 2015 at 23:07 Gyula Fóra gyula.f...@gmail.com wrote: Hey, I am not sure if I get it, why aren't the results correct? You don't instantly get the global top-k, but you are always updating it with the new local results. Gyula Aljoscha Krettek aljos

Re: NullPointerException when working with Windows

2015-07-28 Thread Aljoscha Krettek
Hi, no, this is unfortunately not fixed in the current master. Cheers, Aljoscha On Tue, 28 Jul 2015 at 15:29 Ufuk Celebi u...@apache.org wrote: Hey Phillip, thanks for reporting the problem. I think your assessment is correct. If the program is already finished, the threads throwing the

Re: DataSet Conversion

2015-07-13 Thread Aljoscha Krettek
Hi Lydia, it might work using new DataSet(javaSet) where DataSet is org.apache.flink.api.scala.DataSet. I'm not sure, however. What is your use case for this? Cheers, Aljoscha On Mon, 13 Jul 2015 at 15:55 Lydia Ickler ickle...@googlemail.com wrote: Hi guys, is it possible to convert a Java

Re: Flink Scala performance

2015-07-15 Thread Aljoscha Krettek
Hi, that depends. How are you executing the program? Inside an IDE? By starting a local cluster? And then, how big is your input data? Cheers, Aljoscha On Wed, 15 Jul 2015 at 23:45 Vinh June hoangthevinh@gmail.com wrote: I just realized that Flink program takes a lot of time to run, for

Re: Flink Kafka example in Scala

2015-07-16 Thread Aljoscha Krettek
Hi, your first example doesn't work because the SimpleStringSchema does not work for sinks. You can use this modified serialization schema: https://gist.github.com/aljoscha/e131fa8581f093915582. This works for both source and sink (I think the current SimpleStringSchema is not correct and should

Re: Application-specific loggers configuration

2015-08-25 Thread Aljoscha Krettek
Hi Gwenhaël, are you using the one-yarn-cluster-per-job mode of Flink? I.e., you are starting your Flink job with (from the doc): flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096 ./examples/flink-java-examples-0.10-SNAPSHOT-WordCount.jar If you are, then this is almost possible on the current

Re: Flink Data Stream Union

2015-10-21 Thread Aljoscha Krettek
which i run the code. > > > Yes, i did forget to post here, but my program calls the unionMessageStreams() > > On Wed, Oct 21, 2015 at 9:39 AM, Aljoscha Krettek > <aljoscha.kret...@gmail.com> wrote: > Hi Gayu, > could it be that no data ever arrives on the second

Re: Flink Data Stream Union

2015-10-21 Thread Aljoscha Krettek
So does the filter maybe filter out everything? > On 21 Oct 2015, at 16:18, Gayu <gaa...@gmail.com> wrote: > > Yes, exactly. > > On Wed, Oct 21, 2015 at 10:17 AM, Aljoscha Krettek > <aljoscha.kret...@gmail.com> wrote: > So it is received in the filter but th

Re: Flink Data Stream Union

2015-10-21 Thread Aljoscha Krettek
Hi, first of all, am I correct to assume that new SocketSource(hostName1, port, '\n', -1) should be new SocketTextStreamFunction(hostName1, port1, '\n', -1) or are you using a custom built SocketSource for this? If I replace it by SocketTextStreamFunction and execute it the example runs and

Re: Session Based Windows

2015-10-21 Thread Aljoscha Krettek
certainly possible that I > messed something up while refactoring to the API change. I will look at > it further when I get a chance, but if you have any thoughts they are much > appreciated. > > > Thanks, > Paul Hamilton > > > On 10/17/15, 6:39 AM, "Aljos

Re: Session Based Windows

2015-10-23 Thread Aljoscha Krettek
Hi Paul, the key based state should now be fixed in the current 0.10-SNAPSHOT builds if you want to continue playing around with it. Cheers, Aljoscha > On 21 Oct 2015, at 19:40, Aljoscha Krettek <aljos...@apache.org> wrote: > > Hi Paul, > good to hear that the windo

Re: Error handling

2015-11-16 Thread Aljoscha Krettek
Hi, I don’t think that alleviates the problem. Sometimes you might want the system to continue even if stuff outside the UDF fails. For example, if a serializer does not work because of a null value somewhere. You would, however, like to get a message about this somewhere, I assume. Cheers,

Re: How best to deal with wide, structured tuples?

2015-11-05 Thread Aljoscha Krettek
Hi, these are some interesting Ideas. I have some thoughts, though, about the current implementation. 1. With Schema and Field you are basically re-implementing RowTypeInfo, so it should not be required. Maybe just an easier way to create a RowTypeInfo. 2. Right now, in Flink the

Re: How to preserve KeyedDataStream

2015-11-04 Thread Aljoscha Krettek
rious thing is that in the 2nd statement .keyBy(t -> t.f1) works > but .keyBy(1) does not, even though they do the same thing. I'm using Idea at > the moment so it can be just another type inference problem with that IDE. > > cheers > Martin > > On Tue, Nov 3, 2015 at 3:06 P

Re: How to preserve KeyedDataStream

2015-11-03 Thread Aljoscha Krettek
Hi, where are you storing the results of each window computation to? Maybe you could also store it from inside a custom WindowFunction where you just count the elements and then store the results. On the other hand, adding a (1) field and doing a window reduce (à la WordCount) is going to be

Re: Session Based Windows

2015-10-17 Thread Aljoscha Krettek
Hi Paul, it’s good to see people interested in this. I sketched a Trigger that should fit your requirements: https://gist.github.com/aljoscha/a7c6f22548e7d24bc4ac You can use it like this: DataStream<> input = … DataStream<> result = input .keyBy(“session-id”)

Re: Event-Time Windowing

2015-10-07 Thread Aljoscha Krettek
Hi, right now, the 0.10-SNAPSHOT is in a bit of a weird state. We still have the old windowing API in there alongside the new one. To make your example use the new API that actually uses the timestamps and watermarks you would use the following code:

Re: Event-Time Windowing

2015-10-07 Thread Aljoscha Krettek
ll I be able to switch from event-time to > processing- or ingestion-time without having to adjust my code? > > Best, > Alex > > Aljoscha Krettek <aljos...@apache.org> schrieb am Mi., 7. Okt. 2015, > 17:23: > >> Hi, >> right now, the 0.10-SNAPSHOT

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-17 Thread Aljoscha Krettek
pojo.getTime(); >return pojo.getTime(); >} > >@Override >public long extractWatermark(Pojo pojo, long l) { >return Long.MIN_VALUE; >} > >@Override >public long getCurrentWatermark() { >return lastTimestamp - maxDelay;

Re: Event time in Flink streaming

2015-08-28 Thread Aljoscha Krettek
Hi Martin, the answer depends, because the current windowing implementation has some problems. We are working on improving it in the 0.10 release, though. If your elements arrive with strictly increasing timestamps and you have parallelism=1 or don't perform any re-partitioning of data (which a

Re: [0.10-SNAPSHOT ] When naming yarn application (yarn-session -nm), flink run without -m fails.

2015-08-26 Thread Aljoscha Krettek
Hi Arnaud, I think my answer to Gwenhaël could also be helpful to you: are you using the one-yarn-cluster-per-job mode of Flink? I.e., you are starting your Flink job with (from the doc): flink run -m yarn-cluster -yn 4 -yjm 1024 -ytm 4096

Re: Flink join with external source

2015-09-05 Thread Aljoscha Krettek
Hi Jerry, it should be possible to just use the Redis API inside a Flink operator, for example a map or flatMap. You can use RichFunctions ( https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#rich-functions) to setup the connection and close it after computation

Re: Splitting Streams

2015-09-03 Thread Aljoscha Krettek
Hi Martin, maybe this is what you are looking for: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#output-splitting Regards, Aljoscha On Thu, 3 Sep 2015 at 12:02 Till Rohrmann wrote: > Hi Martin, > > could grouping be a solution to your

Re: Performance Issue

2015-09-08 Thread Aljoscha Krettek
Hi Rico, I have a suspicion. What is the distribution of your keys? That is, are there many unique keys, do the keys keep evolving, i.e. is it always new and different keys? Cheers, Aljoscha On Tue, 8 Sep 2015 at 13:44 Rico Bergmann wrote: > I also see in the TM overview

Re: Performance Issue

2015-09-09 Thread Aljoscha Krettek
; Am 08.09.2015 um 20:05 schrieb Aljoscha Krettek <aljos...@apache.org>: > > Hi Rico, > I have a suspicion. What is the distribution of your keys? That is, are > there many unique keys, do the keys keep evolving, i.e. is it always new > and different keys? > > Cheers, >

Re: FLink Streaming - Parallelize ArrayList

2015-09-29 Thread Aljoscha Krettek
No, each operator would have its own local list. In a distributed environment it is very tricky to keep global state across all instances of operations (Flink does not support anything in this direction). If you really need it then the only way is to set the parallelism of the operator to 1. This

Re: Immutable data

2015-09-23 Thread Aljoscha Krettek
Hi Jack, Stephan is right, this should work. Unfortunately the TypeAnalyzer does not correctly detect that it cannot treat your Id class as a Pojo. I will add a Jira issue for that. For the time being you can use this command to force the system to use Kryo: env.getConfig.enableForceKryo(); I

Re: Performance Issue

2015-09-24 Thread Aljoscha Krettek
Hi Rico, you should be able to get it with these steps: git clone https://github.com/StephanEwen/incubator-flink.git flink cd flink git checkout -t origin/windows This will get you on Stephan's windowing branch. Then you can do a mvn clean install -DskipTests to build it. I will merge his

Re: Performance Issue

2015-09-24 Thread Aljoscha Krettek
ime. But I don't know whether this could be a setup problem. I > noticed the os load of my testsystem was around 90%. So it might be more a > setup problem ... > > Thanks for your support so far. > > Cheers. Rico. > > > > > > Am 24.09.2015 um 09:33 schrieb Aljosch

Re: Window based on tuple timestamps

2015-09-18 Thread Aljoscha Krettek
Hi Philipp, am I correct to assume that your tuples do not arrive in the order of the timestamp that you extract. Unfortunately, for that case the current windowing implementation does not work correctly. We are working hard on fixing this for the upcoming 0.10 release, though. If you are

Re: Expressions for Table operations (select, filter)?

2015-09-18 Thread Aljoscha Krettek
Hi Stefan, I added a section in the documentation that describes the syntax of the expressions. It is a bit bare bones but I hope it helps nonetheless. https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html Cheers, Aljoscha On Wed, 16 Sep 2015 at 14:55 Aljoscha Krettek <al

Re: Window based on tuple timestamps

2015-09-19 Thread Aljoscha Krettek
> Philipp > > > On 18.09.2015 17:05, Aljoscha Krettek wrote: > > Hi Philipp, > am I correct to assume that your tuples do not arrive in the order of the > timestamp that you extract. Unfortunately, for that case the current > windowing implementation does not work corr

Re: Question about DataStream serialization

2015-12-08 Thread Aljoscha Krettek
closure, reproduction, or dissemination) > by persons other than the intended recipient(s) is prohibited. If you receive > this e-mail in error, please notify the sender by phone or email immediately > and delete it! > > > -Original Message- > From: Aljoscha Krette

Re: Question about DataStream serialization

2015-12-09 Thread Aljoscha Krettek
Right now, it is exactly "Object.hash % getNumberOfParallelSubtasks()”... > On 09 Dec 2015, at 02:37, Radu Tudoran wrote: > > Object.hash % getNumberOfParallelSubtasks()

Re: NPE with Flink Streaming from Kafka

2015-12-02 Thread Aljoscha Krettek
Hi Mihail, could you please give some information about the number of keys that you are expecting in the data and how big the elements are that you are processing in the window. Also, are there any other operations that could be taxing on Memory. I think the different exception you see for

Re: Getting two types of events from a Window (Trigger)?

2015-12-11 Thread Aljoscha Krettek
Hi Niels, I’m afraid this will not work. (If I understood correctly what you are trying to do.) When the trigger is being serialized/deserialized each parallel instance of the trigger has their own copy of the QueueSource object. Plus, a separate instance of the QueueSource itself will be

Re: Serialisation problem

2015-12-14 Thread Aljoscha Krettek
Hi, the problem could be that GValue is not Comparable. Could you try making it extend Comparable (The Java Comparable). Cheers, Aljoscha > On 12 Dec 2015, at 20:43, Robert Metzger wrote: > > Hi, > > Can you check the log output in your IDE or the log files of the Flink

Re: Behaviour of CountWindowAll

2015-12-14 Thread Aljoscha Krettek
Hi Nirmalya, when using count windows the window will trigger after “slide-size” elements have been received. So, since in your example, slide-size is set to 1 it will emit a new max for every element received and once it accumulated 4 elements it will start removing one element for every new

Re: Behaviour of CountWindowAll

2015-12-14 Thread Aljoscha Krettek
Hi, the current behavior is in fact that the window will be triggered every “slide-size” elements and the computation will take into account the last “window-size” elements. So for a window with window-size 10 and slide-size 5 the window will be triggered every 5 elements. This means that your

Re: Behaviour of CountWindowAll

2016-01-05 Thread Aljoscha Krettek
Hi, I’m afraid this is not possible right now. I’m also not sure about the Evictors as a whole. Using them makes window operations very slow because all elements in a window have to be kept, i.e. window results cannot be pre-aggregated. Cheers, Aljoscha > On 15 Dec 2015, at 12:23, Radu Tudoran

Re: Questions re: ExecutionGraph & ResultPartitions for interactive use a la Spark

2016-01-05 Thread Aljoscha Krettek
Hi, these are certainly valid use cases. As far is I know, the people who know most in this area are on vacation right now. They should be back in a week, I think. They should be able to give you a proper description of the current situation and some pointers. Cheers, Aljoscha > On 04 Jan

Re: Sink - Cassandra

2016-01-05 Thread Aljoscha Krettek
Hi Sebastian, I’m afraid the people working on Flink don’t have much experience with Cassandra. Maybe you could look into the Elasticsearch sink and adapt it to write to Cassandra instead. That could be a valuable addition to Flink. Cheers, Aljoscha > On 22 Dec 2015, at 14:36, syepes

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-25 Thread Aljoscha Krettek
from > getCurrentWatermark() and emitting a watermark at every record) > > If I set > > StreamExecutionEnvironment.setParallelism(5); > > it does not work. > > So, if I understood you correctly, it is the opposite of what you were > expecting?! > > Chee

Re: Working with the Windowing functionality

2015-11-27 Thread Aljoscha Krettek
nds 'onEventTime' calls per second. > > So thank you. I now understand I have to be more careful with these timers!. > > Niels Basjes > > > > On Fri, Nov 27, 2015 at 11:28 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > Hi Niels, > do the records that arrive from Ka

Re: Interpretation of Trigger and Eviction on a window

2015-11-30 Thread Aljoscha Krettek
Hi, the function is in fact applied to the remaining elements (at least I hope it is). So the first sentence should be the correct one. Cheers, Aljoscha > On 28 Nov 2015, at 03:14, Nirmalya Sengupta > wrote: > > Hello Fabian, > > From your reply to this thread:

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-30 Thread Aljoscha Krettek
ink what we will need at some point for this are approximate whatermarks > which correlate event and ingest time. > > I think they have similar concepts in Millwheel/Dataflow. > > Cheers, > Gyula > On Mon, Nov 30, 2015 at 5:29 PM Aljoscha Krettek <aljos...@apache.org

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-30 Thread Aljoscha Krettek
Hi, as an addition. I don’t have a solution yet, for the general problem of what happens when a parallel instance of a source never receives elements. This watermark business is very tricky... Cheers, Aljoscha > On 30 Nov 2015, at 17:20, Aljoscha Krettek <aljos...@apache.org> wrote

Re: Interpretation of Trigger and Eviction on a window

2015-11-30 Thread Aljoscha Krettek
Hi, the Evictor is very tricky to understand, I’m afraid. What happens when a Trigger fires is the following: 1. Trigger fires 2. Evictor can remove elements from the window buffer 3. Window function processes the elements that remain in the window buffer The tricky thing here is that the

Re: Triggering events

2015-11-30 Thread Aljoscha Krettek
Hi, the problem here is that the system needs to be aware that Watermarks will be flowing through the system. You can either do this via: env.setStreamTimeCharacteristic(EventTime); or: env.getConfig().enableTimestamps(); I know, not very intuitive. Cheers, Aljoscha > On 30 Nov 2015, at

Re: Working with State example /flink streaming

2015-11-27 Thread Aljoscha Krettek
Hi, I’ll try to go into a bit more detail about the windows here. What you can do is this: DataStream> input = … // fields are (id, sum, count), where count is initialized to 1, similar to word count DataStream> counts = input

Re: Working with the Windowing functionality

2015-11-27 Thread Aljoscha Krettek
Hi Niels, do the records that arrive from Kafka already have the session ID or do you want to assign them inside your Flink job based on the idle timeout? For the rest of your problems you should be able to get by with what Flink provides: The triggering can be done using a custom Trigger that

Re: Doubt about window and count trigger

2015-11-27 Thread Aljoscha Krettek
Hi Anwar, what Fabian wrote is completely right. I just want to give the reasoning for why the CountTrigger behaves as it does. The idea was to have Triggers that clearly focus on one thing and then at some point add combination triggers. For example, an OrTrigger that triggers if either of

Re: Session Based Windows

2015-11-18 Thread Aljoscha Krettek
that > events of different sessions can not intermingled. Isn't the idea of the > keyBy expression below exactly not to have intermingled sessions by > first grouping by sesion-ids? > > Cheers and thank you, > > Konstantin > > On 17.10.2015 14:39, Aljoscha Krettek wrote: &g

Re: finite subset of an infinite data stream

2015-11-18 Thread Aljoscha Krettek
Hi, I wrote a little example that could be what you are looking for: https://github.com/dataArtisans/query-window-example It basically implements a window operator with a modifiable window size that also allows querying the current accumulated window contents using a second input stream.

Re: Custom TimestampExtractor and FlinkKafkaConsumer082

2015-11-19 Thread Aljoscha Krettek
o, if I understood you correctly, it is the opposite of what you were > expecting?! > > Cheers, > > Konstantin > > > On 17.11.2015 11:32, Aljoscha Krettek wrote: >> Hi, >> actually, the bug is more subtle. Normally, it is not a problem that the >> Times

Re: finite subset of an infinite data stream

2015-11-20 Thread Aljoscha Krettek
but have an obstacle with > org.apache.flink.runtime.state.AbstractStateBackend class. Where to get it? I > guess it stored in your local branch only. Would you please to send me > patches for public branch or share the branch with me? > > Best regards, > Roman > > > 20

Re: YARN High Availability

2015-11-19 Thread Aljoscha Krettek
Yes, that’s what I meant. > On 19 Nov 2015, at 12:08, Till Rohrmann <trohrm...@apache.org> wrote: > > You mean an additional start-up parameter for the `start-cluster.sh` script > for the HA case? That could work. > > On Thu, Nov 19, 2015 at 11:54 AM, Aljoscha Kret

Re: YARN High Availability

2015-11-19 Thread Aljoscha Krettek
if > necessary. However, the user is more likely to lose his state when shutting > down the cluster. > > On Thu, Nov 19, 2015 at 10:55 AM, Robert Metzger <rmetz...@apache.org> > wrote: > >> I agree with Aljoscha. Many companies install Flink (and its config) in a &g

Re: ReduceByKeyAndWindow in Flink

2015-11-23 Thread Aljoscha Krettek
Hi, @Konstantin: are you using event-time or processing-time windows. If you are using processing time, then you can only do it the way Fabian suggested. The problem here is, however, that the .keyBy().reduce() combination would emit a new maximum for every element that arrives there and you

Re: Exception using flink-connector-elasticsearch

2016-01-12 Thread Aljoscha Krettek
Hi, could you please try adding the lucene-core-4.10.4.jar file to your lib folder of Flink. (https://repo1.maven.org/maven2/org/apache/lucene/lucene-core/4.10.4/) Elasticsearch uses dependency injection to resolve the classes and maven is not really aware of this. Also you could try adding

Re: Extracting Timestamp in MapFunction

2016-06-09 Thread Aljoscha Krettek
Hi, could you try pulling the problem apart, i.e. determine at which point in the pipeline you have duplicate data. Is it after the sources or in the CoFlatMap or the Map after the reduce, for example? Cheers, Aljoscha On Wed, 1 Jun 2016 at 17:11 Biplob Biswas wrote:

Re: Scala case classes with a generic parameter

2016-06-09 Thread Aljoscha Krettek
Hi James, the TypeInformation must be available at the call site, not in the case class definition. In your WindowFunction you are using a TestGen[String] so it should suffice to add this line at some point before the call to apply(): implicit val testGenType =

Re: Periodically evicting internal states when using mapWithState()

2016-06-07 Thread Aljoscha Krettek
Hi Jack, right now this is not possible except when writing a custom operator. We are working on support for a time-to-live setting on states, this should solve your problem. For writing a custom operator, check out DataStream.transform() and StreamMap, which is the operator implementation for

Re: java.io.IOException: Couldn't access resultSet

2016-06-06 Thread Aljoscha Krettek
The problem could be that open() is not called with a proper Configuration object in streaming mode. On Sun, 5 Jun 2016 at 19:33 Stephan Ewen wrote: > Hi David! > > You are using the JDBC format that was written for the batch API in the > streaming API. > > While that should

Re: env.fromElements produces TypeInformation error

2016-06-06 Thread Aljoscha Krettek
Hi, I think the problem is that the case class has generic parameters. You can try making TypeInformation for those parameters implicitly available at the call site, i.e: implicit val typeT = createTypeInformation[T] // where you insert the specific type for T and do the same for the other

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-06-07 Thread Aljoscha Krettek
That's nice. Can you try it on your cluster with an added "reset" call on the buffer? On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote: > After "some" digging into this problem I'm quite convinced that the > problem is caused by a missing reset of the buffer during the

Re: Window start and end issue with TumblingProcessingTimeWindows

2016-06-07 Thread Aljoscha Krettek
Hi, I'm afraid you're running into a bug into the special processing-time window operator. A suggested workaround would be to switch to characteristic IngestionTime and use TumblingEventTimeWindows. I also open a Jira issue for the bug so that we can keep track of it:

Re: NotSerializableException

2016-06-10 Thread Aljoscha Krettek
on. So I applied a MapFunction to DataSet and put a >> dummy value in the join field/key where it was null. Then In the join >> function, I change it back to null. >> >> Best, >> Tarandeep >> >> On Thu, Jun 9, 2016 at 7:06 AM, Aljoscha Krettek <aljos...@apache.o

Re: Using Flink watermarks and a large window state for scheduling

2016-06-09 Thread Aljoscha Krettek
Hi Josh, I'll have to think a bit about that one. Once I have something I'll get back to you. Best, Aljoscha On Wed, 8 Jun 2016 at 21:47 Josh wrote: > This is just a question about a potential use case for Flink: > > I have a Flink job which receives tuples with an event id

Re: NotSerializableException

2016-06-09 Thread Aljoscha Krettek
Hi, the problem is that the KeySelector is an anonymous inner class and as such as a reference to the outer RecordFilterer object. Normally, this would be rectified by the closure cleaner but the cleaner is not used in CoGroup.where(). I'm afraid this is a bug. Best, Aljoscha On Thu, 9 Jun 2016

Re: No key found restore States

2016-06-03 Thread Aljoscha Krettek
Hi, right now, the way to do it is by using a custom operator, i.e. a OneInputStreamOperator. There you have the low-level control and can set timers based on watermarks or processing time. You can, for example look at StreamMap for a very simple operator or WindowOperator for an operator that

Re: Event processing time with lateness

2016-06-04 Thread Aljoscha Krettek
Hi Igor, you might be interested in this doc about how we want to improve handling of late data and some other things in the windowing API: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing I've sent it around several times but you can never know

Re: Non blocking operation in Apache flink

2016-05-25 Thread Aljoscha Krettek
5, 2016 at 5:16 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> Hi, >> there is no functionality to have asynchronous calls in user functions in >> Flink. >> >> The asynchronous action feature in Spark is also not meant for such >

Re: enableObjectReuse and multiple downstream operators

2016-05-25 Thread Aljoscha Krettek
Hi Bart, yup, this is a bug. AFAIK it is now known, would you like to open the Jira issue for it? If not, I can also open one. The problem is in the interaction of how chaining works in the streaming API with object reuse. As you said, with how it is implemented it serially calls the two map

Re: Incremental updates

2016-05-25 Thread Aljoscha Krettek
server joined and rebalance >> the processing? How is it done if I have a keyed stream and some custom >> ValueState variables? >> >> Cheers, >> Gosia >> >> 2016-05-25 11:32 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: >> >>> Hi Gosia, &

Re: whats is the purpose or impact of -yst( --yarnstreaming ) argument

2016-05-25 Thread Aljoscha Krettek
Hi Prateek, this is a deprecated setting that affects how memory is allocated in Flink Worker nodes. Since at least 1.0.0 the default behavior is the behavior that would previously be requested by the --yst flag. In short, you don't need the flag when running streaming programs. (Except Robert

Re: Dynamic partitioning for stream output

2016-05-26 Thread Aljoscha Krettek
Hi, while I think it would be possible to do it by creating a "meta sink" that contains several RollingSinks I think the approach of integrating it into the current RollinkSink is better. I think it's mostly a question of style and architectural purity but also of resource consumption and

Re: HBase reads and back pressure

2016-06-13 Thread Aljoscha Krettek
Hi, I'm afraid this is currently a shortcoming in the API. There is this open Jira issue to track it: https://issues.apache.org/jira/browse/FLINK-3869. We can't fix it before Flink 2.0, though, because we have to keep the API stable on the Flink 1.x release line. Cheers, Aljoscha On Mon, 13 Jun

Re: Arrays values in keyBy

2016-06-13 Thread Aljoscha Krettek
Yes, this is correct. Right now we're basically using .hashCode() for keying. (Which can be problematic in some cases.) Beam, for example, clearly specifies that the encoded form of a value should be used for all comparisons/hashing. This is more well defined but can lead to slow performance in

  1   2   3   4   5   6   7   8   9   10   >