Re: [DISCUSS] Project build time and possible restructuring

2017-02-21 Thread Jin Mingjian
The repo splitting is the result of the grown code base. So this will happen finally. The problem is when and how. when: the time point seems not bad. how: is the schema good? I assume we can not add committer per project(or the committer is just a logic concept?). So just splitting into

[jira] [Created] (FLINK-5878) Add stream-stream inner join on TableAPI

2017-02-21 Thread Shaoxuan Wang (JIRA)
Shaoxuan Wang created FLINK-5878: Summary: Add stream-stream inner join on TableAPI Key: FLINK-5878 URL: https://issues.apache.org/jira/browse/FLINK-5878 Project: Flink Issue Type: New

[DISCUSS] Code style / checkstyle

2017-02-21 Thread Dawid Wysakowicz
Hi, I would like to resurrect the discussing ([1] , [2] ) about creating unified code

[jira] [Created] (FLINK-5877) Fix Scala snippet in Async I/O API doc

2017-02-21 Thread Andrea Sella (JIRA)
Andrea Sella created FLINK-5877: --- Summary: Fix Scala snippet in Async I/O API doc Key: FLINK-5877 URL: https://issues.apache.org/jira/browse/FLINK-5877 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-5876) Mention Scala type fallacies for queryable state client serializers

2017-02-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5876: -- Summary: Mention Scala type fallacies for queryable state client serializers Key: FLINK-5876 URL: https://issues.apache.org/jira/browse/FLINK-5876 Project: Flink

[jira] [Created] (FLINK-5875) Use TypeComparator.hash instead of Object.hashCode() for keying in DataStream API

2017-02-21 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-5875: - Summary: Use TypeComparator.hash instead of Object.hashCode() for keying in DataStream API Key: FLINK-5875 URL: https://issues.apache.org/jira/browse/FLINK-5875

[jira] [Created] (FLINK-5874) Reject arrays as keys in DataStream API to avoid inconsistent hashing

2017-02-21 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-5874: - Summary: Reject arrays as keys in DataStream API to avoid inconsistent hashing Key: FLINK-5874 URL: https://issues.apache.org/jira/browse/FLINK-5874 Project: Flink

Re: [DISCUSS] Project build time and possible restructuring

2017-02-21 Thread Theodore Vasiloudis
Hello all, >From a library developer POV I think splitting up the project will have more advantages than disadvantages. Api breaking things should move to be the responsibility of library developers, and with automated tests they shouldn't be too hard to catch. I think I'm more fin favor of

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Ovidiu-Cristian MARCU
My case is the following: I have one stream source of elements, each element contains some key. I create a KeyedStream and then window it (so I get a WindowedStream) on top of which I apply some window function. Some numbers to my problem: 1 million records, 1000 keys. I assume parallelism is

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Katherin Eri
Till, thank you for your response. But I need several points to clarify: 1) Yes, batch and batch ML is the field full of alternatives, but in my opinion that doesn’t mean that we should ignore the problem of not developing batch part of Flink. You know: Apache Beam, Apache Mahout they both feel

[DISCUSS] Should we supply a new Iterator instance for Functions with Iterable input(s) like CoGroupFunction ?

2017-02-21 Thread Lin Li
Hi, When I try to implement https://issues.apache.org/jira/browse/FLINK-5498 via "dataset.coGroup(another dataset)" with a generated CoGroupFunction.(CoGroupFunction interface: public void coGroup(Iterable first, Iterable second, Collector out) I couldn't get the right results, then I

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Aljoscha Krettek
I'm afraid that won't work because we also internally use murmur hash on the result of hashCode(). @Ovidiu I still want to understand why you want to use keyBy() for that case. It sounds like you want to use it because you would like to do something else but that is not possible with the Flink

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Greg Hogan
Integer's hashCode is the identity function. Store your slot index in an Integer or IntValue and key off that field. On Tue, Feb 21, 2017 at 6:04 AM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi, > > As in my example, each key is a window so I want to evenly distributed >

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Stavros Kontopoulos
Ok I see. Suppose we solve all the critical issues. And suppose we dont go with the pure online model (although online ML has a potential)... should we move on with the current ML implementation which is for batch processing (to the best of my knowledge)? The parameter server problem is a long

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Till Rohrmann
Thanks a lot for all your valuable input. It's great to see all your interest in Flink and its ML library :-) 1) Direction of FlinkML In order to reboot the FlinkML library we should indeed first decide on its direction and come up with a roadmap to get the community behind. Since we only have

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Ovidiu-Cristian MARCU
Filled in https://issues.apache.org/jira/browse/FLINK-5873 Best, Ovidiu > On 21 Feb 2017, at 12:00, Ovidiu-Cristian MARCU > wrote: > > Hi Till, > > I will look into filling a jira issue. > > Regarding the

[jira] [Created] (FLINK-5873) KeyedStream: expose an user defined function for key group assignment

2017-02-21 Thread Ovidiu Marcu (JIRA)
Ovidiu Marcu created FLINK-5873: --- Summary: KeyedStream: expose an user defined function for key group assignment Key: FLINK-5873 URL: https://issues.apache.org/jira/browse/FLINK-5873 Project: Flink

Re: [DISCUSS] Table API / SQL indicators for event and processing time

2017-02-21 Thread Fabian Hueske
Hi Xingcan, thanks for your thoughts. In principle you are right that the monotone attribute property would be sufficient, however there are more aspects to consider than that. Flink is a parallel stream processor engine which means that data is processed in separate processes and shuffle across

[jira] [Created] (FLINK-5872) WebUI shows "(null)" root-exception even without exception

2017-02-21 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5872: --- Summary: WebUI shows "(null)" root-exception even without exception Key: FLINK-5872 URL: https://issues.apache.org/jira/browse/FLINK-5872 Project: Flink

[jira] [Created] (FLINK-5871) Enforce uniqueness of pattern names in CEP.

2017-02-21 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5871: - Summary: Enforce uniqueness of pattern names in CEP. Key: FLINK-5871 URL: https://issues.apache.org/jira/browse/FLINK-5871 Project: Flink Issue Type: Bug

[DISCUSS] Project build time and possible restructuring

2017-02-21 Thread Till Rohrmann
Hi Flink community, I'd like to revive a discussion about Flink's build time and project structure which we already had in some other mailing thread [1] and which we wanted do after the 1.2 release. Recently, we can see that Flink is exceeding more and more often Travis maximum build time of 50

[jira] [Created] (FLINK-5870) Make handlers aware of their REST URLs

2017-02-21 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-5870: --- Summary: Make handlers aware of their REST URLs Key: FLINK-5870 URL: https://issues.apache.org/jira/browse/FLINK-5870 Project: Flink Issue Type:

[jira] [Created] (FLINK-5869) ExecutionGraph use FailoverCoordinator to manage the failover of execution vertexes

2017-02-21 Thread shuai.xu (JIRA)
shuai.xu created FLINK-5869: --- Summary: ExecutionGraph use FailoverCoordinator to manage the failover of execution vertexes Key: FLINK-5869 URL: https://issues.apache.org/jira/browse/FLINK-5869 Project:

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Ovidiu-Cristian MARCU
Hi, As in my example, each key is a window so I want to evenly distributed processing to all slots. If I have 100 keys and 100 slots, for each key I have the same rate of events, I don’t want skewed distribution. Best, Ovidiu > On 21 Feb 2017, at 11:38, Aljoscha Krettek

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Theodore Vasiloudis
Thank you all for your thoughts on the matter. Andrea brought up some further engine considerations that we need to address in order to have a competitive ML engine on Flink. I'm happy to see many people willing to contribute to the development of ML on Flink. The way I see it, there needs to be

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Ovidiu-Cristian MARCU
Hi Till, I will look into filling a jira issue. Regarding the key group assignment, you;re right, there was a mistake in my code, here it is code and distribution: numServers is maxParallelism int numKeys = 1024; HashMap groups = new HashMap

[jira] [Created] (FLINK-5868) Implement a new RestartStrategy that works for the FailoverRegion.

2017-02-21 Thread shuai.xu (JIRA)
shuai.xu created FLINK-5868: --- Summary: Implement a new RestartStrategy that works for the FailoverRegion. Key: FLINK-5868 URL: https://issues.apache.org/jira/browse/FLINK-5868 Project: Flink

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Ovidiu-Cristian MARCU
Hi, Any thoughts on this issue: related to what Till proposed 'to figure a key out whose hashes are uniformly distributed over the key groups’ and a way of exposing the key group assignment through the api? I wonder how other users are facing this issue. Having a small set of keys (related to

[jira] [Created] (FLINK-5866) The implementation of FailoverRegion.

2017-02-21 Thread shuai.xu (JIRA)
shuai.xu created FLINK-5866: --- Summary: The implementation of FailoverRegion. Key: FLINK-5866 URL: https://issues.apache.org/jira/browse/FLINK-5866 Project: Flink Issue Type: Sub-task

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Aljoscha Krettek
Hi Ovidiu, what's the reason for wanting to make the parallelism equal to the number of keys? I think in general it's very hard to ensure that hashes even go to different key groups. It can always happen that all your keys (if you have so few of them) are assigned to the same parallel operator

[jira] [Created] (FLINK-5865) Throw original exception in RocksDB states

2017-02-21 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-5865: --- Summary: Throw original exception in RocksDB states Key: FLINK-5865 URL: https://issues.apache.org/jira/browse/FLINK-5865 Project: Flink Issue Type:

Re: KeyGroupRangeAssignment ?

2017-02-21 Thread Till Rohrmann
Hi Ovidiu, at the moment it is not possible to plugin a user defined hash function/key group assignment function. If you like, then you can file a JIRA issue to add this functionality. The key group assignment in your example looks quite skewed. One question concerning how you calculated it:

[jira] [Created] (FLINK-5864) CEP: fix duplicate output patterns problem.

2017-02-21 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-5864: - Summary: CEP: fix duplicate output patterns problem. Key: FLINK-5864 URL: https://issues.apache.org/jira/browse/FLINK-5864 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-5863) Unify the serialization of queryable list states in different backends

2017-02-21 Thread Xiaogang Shi (JIRA)
Xiaogang Shi created FLINK-5863: --- Summary: Unify the serialization of queryable list states in different backends Key: FLINK-5863 URL: https://issues.apache.org/jira/browse/FLINK-5863 Project: Flink

[jira] [Created] (FLINK-5862) When JobManager fails, TaskManagers do not cancel tasks, but attempt to reconnect to the JobManager, and report states of tasks deployed in

2017-02-21 Thread Biao Liu (JIRA)
Biao Liu created FLINK-5862: --- Summary: When JobManager fails, TaskManagers do not cancel tasks, but attempt to reconnect to the JobManager, and report states of tasks deployed in Key: FLINK-5862 URL: