fromParallelCollection

2016-09-02 Thread rimin515
Hi,val env = StreamExecutionEnvironment.getExecutionEnvironment val tr = env.fromParallelCollection(data) the data i do not know initialize,some one can tell me..

Memory Management in Streaming?

2016-09-02 Thread Shaosu Liu
Hi, I have had issues when I processed large amount of data (large windows where I could not do incremental updates), flink slowed down significantly. It did help when I increased the amount of memory and used off heap allocation. But it only delayed the onset of the probelm without solving it.

Re: Apache Flink: How does it handle the backpressure?

2016-09-02 Thread rss rss
Hi, some time ago I found a problem with backpressure in Spark and prepared a simple test to check it and compare with Flink. https://github.com/rssdev10/spark-kafka-streaming +

Re: How to get latency info from benchmark

2016-09-02 Thread Eric Fukuda
Thanks Robert, I tried to checkout the commit you mentioned, but git returns an error "fatal: reference if not a tree: 547e7490fb99562ca15a2127f0ce1e784db97f3e". I've searched for a solution but could not find any. Am I doing something wrong? - $ git clone

Re: Firing windows multiple times

2016-09-02 Thread Shannon Carey
Of course! I really appreciate your interest & attention. I hope we will figure out solutions that other people can use. I agree with your analysis. Your triggering syntax is particularly nice. I wrote a custom trigger which does exactly that but without the nice fluent API. As I considered

Re: emit a single Map<String, T> per window

2016-09-02 Thread Luis Mariano Guerra
On Fri, Sep 2, 2016 at 5:24 PM, Aljoscha Krettek wrote: > Hi, > from this I would expect to get as many HashMaps as you have keys. The > winFunction is also executed per-key so it cannot combine the HashMaps of > all keys. > > Does this describe the behavior that you're

Re: emit a single Map<String, T> per window

2016-09-02 Thread Aljoscha Krettek
Hi, from this I would expect to get as many HashMaps as you have keys. The winFunction is also executed per-key so it cannot combine the HashMaps of all keys. Does this describe the behavior that you're seeing? Cheers, Aljoscha On Fri, 2 Sep 2016 at 17:37 Luis Mariano Guerra

Re: How to get latency info from benchmark

2016-09-02 Thread Robert Metzger
Hi Eric, I'm sorry that you are running into these issues. I think the version is 0.10-SNAPSHOT, and I think I've used this commit: https://github.com/rmetzger/flink/commit/547e749 for some of the runs (of the throughput / latency tests, not for the yahoo benchmark). The commit should at least

Re: How to get latency info from benchmark

2016-09-02 Thread Eric Fukuda
Hi Robert, I've been trying to build the "performance" project using various versions of Flink, but failing. It seems that I need both KafkaZKStringSerializer class and FlinkKafkaConsumer082 class to build the project, but none of the branches has both of them. KafkaZKStringSerializer existed in

Re: Flink Iterations vs. While loop

2016-09-02 Thread Greg Hogan
Hi Dan, Where are you reading the 200 GB "data" from? How much memory per node? If the DataSet is read from a distributed filesystem and if with iterations Flink must spill to disk then I wouldn't expect much difference. About how many iterations are run in the 30 minutes? I don't know that this

[DISCUSS] Storm 1.x.x support in the compatibility layer

2016-09-02 Thread Maximilian Michels
This should be of concern mostly to the users of the Storm compatibility layer: We just received a pull request [1] for updating the Storm compatibility layer to support Storm versions >= 1.0.0. This is a major change because all Storm imports have changed their namespace due to package renaming.

Remote upload and execute

2016-09-02 Thread Paul Wilson
Hi, I'd like to write a client that can execute an already 'uploaded' JAR (i.e. the JAR is deployed and available by some other external process). This is similar to what the web console allows which consists of 2 steps: upload the JAR followed by a submit with parameters. I'm looking at the

checkpoints not removed on hdfs.

2016-09-02 Thread Dong-iL, Kim
Hi, I’m using HDFS as state backend. The checkpoints folder grows bigger every moments. What shall I do? Regards.

Re: FlinkML ALS matrix factorization: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: null

2016-09-02 Thread ANDREA SPINA
Hi Stefan, Thank you so much for the answer. Ok, I'll do it asap. For the sake of argument, could the issue be related to the low number of blocks? I noticed the Flink implementation, as default, set the number of blocks to the input count (which is actually a lot). So with a low cardinality and

Storing JPMML-Model Object as a Variable Closure?

2016-09-02 Thread Bauss, Julian
Hello Everybody, I’m currently refactoring some code and am looking for a better alternative to handle JPMML-Models in data streams. At the moment the flink job I’m working on references a model-object as a Singleton which I want to change because static references tend to cause problems in

Re: Firing windows multiple times

2016-09-02 Thread Aljoscha Krettek
I see, I didn't forget about this, it's just that I'm thinking hard. I think in your case (which I imagine some other people to also have) we would need an addition to the windowing system that the original Google Dataflow paper called retractions. The problem is best explained with an example.

Re: FlinkML ALS matrix factorization: java.io.IOException: Thread 'SortMerger Reading Thread' terminated due to an exception: null

2016-09-02 Thread Stefan Richter
Hi, unfortunately, the log does not contain the required information for this case. It seems like a sender to the SortMerger failed. The best way to find this problem is to take a look to the exceptions that are reported in the web front-end for the failing job. Could you check if you find any