[jira] [Created] (FLINK-4000) Exception: Could not restore checkpointed state to operators and functions; during Job Restart (Job restart is triggered due to one of the task manager failure)

2016-05-31 Thread Aride Chettali (JIRA)
Aride Chettali created FLINK-4000: - Summary: Exception: Could not restore checkpointed state to operators and functions; during Job Restart (Job restart is triggered due to one of the task manager failure) Key: FLINK-4000

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-31 Thread Tzu-Li (Gordon) Tai
I'd like to be added to the Streaming Connectors component (already edited Wiki) :) Ah, naming, one of the hardest problems in programming :P Some comments: I agree with Robert that the name "maintainers" will be somewhat misleading regarding the authoritative difference with committers / PMCs,

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-31 Thread Simone Robutti
Overseer? Supervisor? Warden? 2016-05-31 21:23 GMT+02:00 Robert Metzger : > Good point. I haven't thought about this name clash. > However, I wonder whether it is clear from the context whether we are > talking about pull request and component shepherding. > > Are there

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-31 Thread Chesnay Schepler
so are we discarding the other "shepherd" role then? On 31.05.2016 19:47, Robert Metzger wrote: Hi, to keep this discussion going, I pasted Stephan's Component proposal into the Wiki: https://cwiki.apache.org/confluence/display/FLINK/Components+and+Shepherds Also, I suggest to rename the

[jira] [Created] (FLINK-3999) Rename the `running` flag in the drivers to `canceled`

2016-05-31 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-3999: -- Summary: Rename the `running` flag in the drivers to `canceled` Key: FLINK-3999 URL: https://issues.apache.org/jira/browse/FLINK-3999 Project: Flink Issue Type:

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-31 Thread Robert Metzger
Hi, to keep this discussion going, I pasted Stephan's Component proposal into the Wiki: https://cwiki.apache.org/confluence/display/FLINK/Components+and+Shepherds Also, I suggest to rename the "maintainer" to "shepherd" to reflect that still the committers and the PMC is in charge and the

[jira] [Created] (FLINK-3998) Access to gauges and counters in StatsDReporter#report() is not properly synchronized

2016-05-31 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3998: - Summary: Access to gauges and counters in StatsDReporter#report() is not properly synchronized Key: FLINK-3998 URL: https://issues.apache.org/jira/browse/FLINK-3998 Project: Flink

Re: PojoComparator question

2016-05-31 Thread Stephan Ewen
The "compareSerialized" should probably internally always reuse instances, where possible. Since these are never passed into user code or anything, that should be okay to do. On Tue, May 31, 2016 at 11:52 AM, Aljoscha Krettek wrote: > Hi, > I think this is an artifact from

[jira] [Created] (FLINK-3997) PRNG Skip-ahead

2016-05-31 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3997: - Summary: PRNG Skip-ahead Key: FLINK-3997 URL: https://issues.apache.org/jira/browse/FLINK-3997 Project: Flink Issue Type: Improvement Components: Gelly

Re: Collision of task number values for the same task

2016-05-31 Thread Alexander Alexandrov
> (c) You have two operators with the same name that become tasks with the same name. Actually it was a variation on that issue. The problem was that I was reading a dataset X which was part of both the dynamic and the static path of a Flink iteration. I guess the duplicates duplicates these

Re: Collision of task number values for the same task

2016-05-31 Thread Stephan Ewen
It could be that (a) The task failed and was restarted. (b) The program has multiple steps (collect() print()), so that parts of the graph get re-executed. (c) You have two operators with the same name that become tasks with the same name. Do any of those explanations make sense in your

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Stephan Ewen
You are right, Chiwan. I think that this pattern you use should be supported, though. Would be good to check if the job executes at the point of the "collect()" calls more than is necessary. That would explain the network buffer issue then... On Tue, May 31, 2016 at 12:18 PM, Chiwan Park

[jira] [Created] (FLINK-3996) Add addition, subtraction and multiply by scalar to DenseVector.scala and SparseVector.scala

2016-05-31 Thread Daniel Blazevski (JIRA)
Daniel Blazevski created FLINK-3996: --- Summary: Add addition, subtraction and multiply by scalar to DenseVector.scala and SparseVector.scala Key: FLINK-3996 URL: https://issues.apache.org/jira/browse/FLINK-3996

[jira] [Created] (FLINK-3995) Properly Structure Test Utils and Dependencies

2016-05-31 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3995: --- Summary: Properly Structure Test Utils and Dependencies Key: FLINK-3995 URL: https://issues.apache.org/jira/browse/FLINK-3995 Project: Flink Issue Type: Bug

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Chiwan Park
Hi Stephan, Yes, right. But KNNITSuite calls ExecutionEnvironment.getExecutionEnvironment only once [1]. I’m testing with moving method call of getExecutionEnvironment to each test case. [1]:

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Stephan Ewen
Hi Chiwan! I think the Execution environment is not shared, because what the TestEnvironment sets is a Context Environment Factory. Every time you call "ExecutionEnvironment.getExecutionEnvironment()", you get a new environment. Stephan On Tue, May 31, 2016 at 11:53 AM, Chiwan Park

Re: Side-effects of DataSet::count

2016-05-31 Thread Ovidiu-Cristian MARCU
Hi Stephan and all, Some reference to this may be https://issues.apache.org/jira/browse/FLINK-2250 ? I agree your priorities on streaming are very high, it will make a big +1 for the community to create a discussion/place for the design

Re: Collision of task number values for the same task

2016-05-31 Thread Ufuk Celebi
On Tue, May 31, 2016 at 11:53 AM, Alexander Alexandrov wrote: > Can somebody shed a light on the execution semantics of the scheduler which > will explain this behavior? The execution IDs are unique per execution attempt. Having two tasks with the same subtask

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Chiwan Park
I’ve created a JIRA issue [1] related to KNN test cases. I will send a PR for it. From my investigation [2], cluster for ML tests have only one taskmanager with 4 slots. Is 2048 insufficient for total number of network numbers? I still think the problem is sharing ExecutionEnvironment between

Collision of task number values for the same task

2016-05-31 Thread Alexander Alexandrov
Hello, I am analyzing the logs from a Flink batch job and am seeing the following two lines: 2016-05-30 15:32:31,701 INFO ...- DataSource (at ${path}) (4/4) (7efe8fcfe9c7c7e6cd4683e1b5c06a3a) switched from SCHEDULED to DEPLOYING 2016-05-30 15:32:31,701 INFO ...- DataSource (at

Re: PojoComparator question

2016-05-31 Thread Aljoscha Krettek
Hi, I think this is an artifact from the past. Using the "non-reuse" deserialize seems more correct, especially in the presence of subclasses. Best, Aljoscha On Mon, 30 May 2016 at 19:13 Gábor Horváth wrote: > Hi! > > While I was working on code generation support for

Re: Side-effects of DataSet::count

2016-05-31 Thread Aljoscha Krettek
That last section is a really good Idea! I have several design docs floating around that were announced on the ML. Without a central place to store them they are hard to find, though. -Aljoscha On Tue, 31 May 2016 at 11:27 Stephan Ewen wrote: > Hi! > > There was some

[jira] [Created] (FLINK-3994) Instable KNNITSuite

2016-05-31 Thread Chiwan Park (JIRA)
Chiwan Park created FLINK-3994: -- Summary: Instable KNNITSuite Key: FLINK-3994 URL: https://issues.apache.org/jira/browse/FLINK-3994 Project: Flink Issue Type: Bug Components: Machine

Re: Side-effects of DataSet::count

2016-05-31 Thread Stephan Ewen
Hi! There was some preliminary work on this. By now, the requirements have grown a bit. The backtracking needs to handle - Scheduling for execution (the here raised point), possibly resuming from available intermediate results - Recovery from partially executed programs, where operators

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Maximilian Michels
Thanks Stephan for the synopsis of our last weeks test instability madness. It's sad to see the shortcomings of Maven test plugins but another lesson learned is that our testing infrastructure should get a bit more attention. We have reached a point several times where our tests where inherently

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Chiwan Park
I think that the tests fail because of sharing ExecutionEnvironment between test cases. I’m not sure why it is problem, but it is only difference between other ML tests. I created a hotfix and pushed it to my repository. When it seems fixed [1], I’ll merge the hotfix to master branch. [1]:

Re: [ANNOUNCE] Build Issues Solved

2016-05-31 Thread Ufuk Celebi
Currently, an ML test is reliably failing and occasionally some HA tests. Is someone looking into the ML test? For HA, I will revert a commit, which might cause the HA instabilities. Till is working on a proper fix as far as I know. On Tue, May 31, 2016 at 3:50 AM, Chiwan Park

[jira] [Created] (FLINK-3993) [py] Add generateSequence() support to Python AP

2016-05-31 Thread Omar Alvarez (JIRA)
Omar Alvarez created FLINK-3993: --- Summary: [py] Add generateSequence() support to Python AP Key: FLINK-3993 URL: https://issues.apache.org/jira/browse/FLINK-3993 Project: Flink Issue Type: