Re: Serialization problem for Guava's TreeMultimap

2016-09-20 Thread Yukun Guo
the issue. > > Thanks for reporting the issue. > > Best, Fabian > > [1] https://issues.apache.org/jira/browse/FLINK-4640 > > 2016-09-20 10:35 GMT+02:00 Yukun Guo <gyk@gmail.com>: > >> Some detail: if running the FoldFunction on a KeyedStream, everything >> wo

Re: Extract type information from SortedMap

2016-07-09 Thread Yukun Guo
rn type can be deduced from the input type(s)". This is true for flatMap(Tuple2<Long, T>, Collector<Tuple2<T, String>>), but if the signature is changed to void flatMap(SortedMap<T, Long>, Collector<Tuple2<T, Long>>), type inference fails. > > O

Extract type information from SortedMap

2016-07-08 Thread Yukun Guo
Hi, When I run the code implementing a generic FlatMapFunction, Flink complained about InvalidTypesException: public class GenericFlatMapper implements FlatMapFunction, Tuple2> { @Override public void flatMap(SortedMap m, Collector>

Re: Tumbling time window cannot group events properly

2016-07-06 Thread Yukun Guo
> buffer for a later window that will be emitted at a future time. > > On Tue, 5 Jul 2016 at 04:35 Yukun Guo <gyk@gmail.com> wrote: > >> The output is the timestamps of events in string. (For convenience, the >> payload of each event is exactly the timestam

Re: Tumbling time window cannot group events properly

2016-07-04 Thread Yukun Guo
orate a bit on what exactly the output means and how > you derive that events are leaking into the previous window? > > On Mon, 4 Jul 2016 at 13:20 Yukun Guo <gyk@gmail.com> wrote: > >> Thanks for the information. Strange enough, after I set the time >> ch

Re: Tumbling time window cannot group events properly

2016-07-04 Thread Yukun Guo
se processing time > if you don't specify a time characteristic. You can enforce using an > event-time window using this: > > stream.window(EventTimeTumblingWindows.of(Time.seconds(10))) > > Cheers, > Aljoscha > > > On Mon, 4 Jul 2016 at 06:00 Yukun Guo <gyk@gmail.c

Re: Strange behavior of DataStream.countWindow

2016-06-11 Thread Yukun Guo
t if invoked multiple times. > The documentation is not discussing this aspect and should be extended. > > Thanks for pointing out this issue. > > Cheers, > Fabian > > > 2016-06-09 13:19 GMT+02:00 Yukun Guo <gyk@gmail.com>: > >> I’m playing with the

Strange behavior of DataStream.countWindow

2016-06-09 Thread Yukun Guo
I’m playing with the (Window)WordCount example from Flink QuickStart. I generate a DataStream consisting of 1000 Strings of random digits, which is windowed with a tumbling count window of 50 elements: import org.apache.flink.api.common.functions.FlatMapFunction;import

Re: Hourly top-k statistics of DataStream

2016-06-06 Thread Yukun Guo
My algorithm is roughly like this taking top-K words problem as an example (the purpose of computing local “word count” is to deal with data imbalance): DataStream of words -> timeWindow of 1h -> converted to DataSet of words -> random partitioning by rebalance -> local “word count” using

Hourly top-k statistics of DataStream

2016-06-06 Thread Yukun Guo
Hi, I'm working on a project which uses Flink to compute hourly log statistics like top-K. The logs are fetched from Kafka by a FlinkKafkaProducer and packed into a DataStream. The problem is, I find the computation quite challenging to express with Flink's DataStream API: 1. If I use something