Re: Multiple operations on a WindowedStream

2016-04-08 Thread Aljoscha Krettek
Hi, the sources don't report records consumed. This is a bit confusing but the records sent/records consumed statistics only talk about Flink-internal sending of records, so a Kafka source would only show sent records. To really see each operator in isolation you should disable chaining for these

Re: Access an event's TimeWindow?

2016-04-08 Thread Aljoscha Krettek
Hi, for cases like these there is a family of "apply" methods on WindowedStream that also take an incremental aggregation function. For example, there is .apply(ReduceFunction, WindowFunction) What this will do is incrementally aggregate the contents of the window. When the window result should

Re: WindowedStream sum behavior

2016-04-08 Thread Aljoscha Krettek
Hi, there no guarantees for the fields other than the summed field (and eventual key fields). I think in practice it's either the fields from the first or last record but I wouldn't rely on that. Cheers, Aljoscha On Sat, 9 Apr 2016 at 03:19 Elias Levy wrote: > I am

RE: Multiple operations on a WindowedStream

2016-04-08 Thread Kanak Biscuitwala
It turns out that the problem is deeper than I originally thought. The flink dashboard reports that 0 records are being consumed, which is quite odd. Is there some issue with the 0.9 consumer on YARN? From: aljos...@apache.org Date: Thu, 7 Apr 2016 09:56:42 + Subject: Re: Multiple

Access an event's TimeWindow?

2016-04-08 Thread Elias Levy
Is there an API to access an event's time window? When you are computing aggregates over a time window, you usually want to output the window along with the aggregate. I could compute the Window on my own, but this seems like an API that should exist.

WindowedStream sum behavior

2016-04-08 Thread Elias Levy
I am wondering about the expected behavior of the sum method. Obviously it sums a specific field in a tuple or POJO. But what should one expect in other fields? Does sum keep the first field, last field or there aren't any guarantees?

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-08 Thread Trevor Grant
I'm just about to open an issue / PR solution for 'warm-starts' Once this is in, we could just add a setter for the weight vector (and what ever iteration you're on if you're going to do more partial fits). Then all you need to save if your weight vector (and iter number). Trevor Grant Data

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-08 Thread Behrouz Derakhshan
Is there a reasons the Predictor or Estimator class don't have read and write methods for saving and retrieving the model? I couldn't find Jira issues for it. Does it make sense to create one ? BR, Behrouz On Wed, Mar 30, 2016 at 4:40 PM, Till Rohrmann wrote: > Yes Suneel

Re: FromIteratorFunction problems

2016-04-08 Thread Andrew Whitaker
Thanks, that example is helpful. It seems like to use `fromCollection` with an iterator it must be an iterator that implements serializable, and Java's built in `Iterator`s don't, unfortunately. On Thu, Apr 7, 2016 at 6:11 PM, Chesnay Schepler wrote: > hmm, maybe i was to

Re: Yammer metrics

2016-04-08 Thread Sanne de Roever
O.k., sounds nice; didn't mean to impose: my train of thought was a bit murky, my apologies for that. Although I had prior experience with JMX, and none with Yammer, at this moment it is all new again. Cleaning up my thinking a bit, the following picture, not directly Flink related, comes up. For

Re: Handling large state (incremental snapshot?)

2016-04-08 Thread Hironori Ogibayashi
Thank you for your suggestion, Regarding throughput, actually, there was a bottleneck at the process which put logs into Kafka. When I added more process, the throughput increased. And, also, HyperLogLog seems a good solution in this case. I will try. Regards, Hironori 2016-04-07 17:45

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
note that we still /could /expose the option of using the yammer reporters; there isn't any technical limitation as of now that would prohibit that. On 08.04.2016 13:05, Chesnay Schepler wrote: I'm very much aware of how Yammer works. As the slides you linked show (near the end) is that

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
I'm very much aware of how Yammer works. As the slides you linked show (near the end) is that there are several small issues with the general-purpose reporters offered by yammer. Instead of hacking around those issues i would very much prefer creating our own reporters that are, again,

Re: Yammer metrics

2016-04-08 Thread Sanne de Roever
I forgot to add some extra information, still all tentative. Earlier (erm, 14 years ago to be honest), I also worked on a JMX monitoring system, and it turned out to be a pain to identify all the components, write the correct jmx queries and then to plot everything. It was hard. This presentation

Re: Yammer metrics

2016-04-08 Thread Sanne de Roever
Thanks Chesnay. Might I make a tentative case for Yammer? I'm not an expert, but I am currently trying to pull together information on this and was reviewing jmxtrans. This is all tentative, I've just dived in a few days ago. Please find the information below. Using Yammer it is possible to load

Re: Joda DateTimeSerializer

2016-04-08 Thread Robert Metzger
Hi Stefano, your fix is the right way to resolve the issue ;) If you want, give me your Confluence Wiki username and I give you edit permissions in our wiki. Otherwise, I'll quickly add a note to the migration guide. On Fri, Apr 8, 2016 at 11:28 AM, Stefano Bortoli wrote:

Re: Integrate Flink with S3 on EMR cluster

2016-04-08 Thread Robert Metzger
Hi Timur, the Flink optimizer runs on the client, so the exception is thrown from the JVM running the ./bin/flink client. Since the statistics sampling is an optional step, its surrounded by a try / catch block that just logs the error message. More answers inline below On Thu, Apr 7, 2016 at

Joda DateTimeSerializer

2016-04-08 Thread Stefano Bortoli
Hi to all, we've just upgraded to Flink 1.0.0 and we had some problems with joda DateTime serialization. The problem was caused by Flink-3305 that removed the JavaKaffee dependency. We had to re-add such dependency in our application and then register the DateTime serializer in the environment:

Re: Yammer metrics

2016-04-08 Thread Chesnay Schepler
Flink currently doesn't expose any metrics beyond those shown in the Dashboard. I am currently working on integrating a new metrics system that is partly /based/ on Yammer/Codahale/Dropwizard metrics. For a first version it is planned to export metrics only via JMX as this effectively

Yammer metrics

2016-04-08 Thread Sanne de Roever
Hi, I´m looking into setting up monitoring for our (Flink) environment and realized that both Kafka and Cassandra use the yammer metrics library. This library enables the direct export of all metrics to Graphite (and possibly statsd). Does Flink use Yammer metrics? Cheers, Sanne