[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax closed the pull request at: https://github.com/apache/flink/pull/1483 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-177069344 I would like to keep this open, until [FLINK-2721](https://issues.apache.org/jira/browse/FLINK-2721) is merged. To make sure, we do not need this -- for this case, we can also close the according JIRA of course. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-176839873 @mjsax What is the status on this PR? Can it be closed since the feature in Storm will be implemented another way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171944496 With embedded mode, I mean to use a spout/bolt within a regular Flink streaming program. For this, you could even have a "plain/raw" input stream (eg, `DataStream`) or a POJO. Both can be translated into StormTuple (which is only used for input tuples). Output tuples are of type `Values` and also translated back to `TupleX` type. Extending the tuples with an additional field would work, too. But I did not like this, because is increases the data that need to be shipped over the network. For embedded mode, we would need to think of a nice "work around" for "plain/raw" and POJO input streams -- but this should not be a problem. As there are no globally unique task-ids anyway in embedded mode, it might be even ok not to support this feature... I personally do not see it as a big (very dangerous) problem in exposing "input channel index" to the operators. But if you are strictly against it, I will go with the "additional field" approach to provide this information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171931389 I think that the Storm compatibility layer is very important. I also think, however, that when making decisions about whether to enhance to core API to serve other APIs or making that other API jump through some hoops then the cleanliness of the core API should be favored if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171930800 What do you mean by embedded usage? Also, isn't there the `StormTuple` that is used as output of some operations. Then, even if plain Tuples are used, they could have an additional field with the TaskId. I still think it could be made to work without putting potentially misleading/dangerous stuff into the general API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171928147 We do not have any message wrappers on the wire. Data is transfered as plain Flink `TupleX` types and are translated into Storm format before handing them over to the bolt. This allows for embedded usage of bolt! Thus, I do not want to change this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171926900 Hi, I would be strongly against exposing the input channel index in any way to operators/user functions. How the index of a channel maps to an upstream operator is internal to the network layer and dependent on several things, some of which are not even deterministic IMHO. Off the top of my head it should depend on the number of upstream tasks, the parallelism of these upstream tasks and the order in which the system choses to attach the channels. I think your goal can be achieved by adding a task-id field in the message wrapper and setting this when emitting elements. Setting this for every processed element (if using the Storm combat layer or not) seems quite wasteful to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171790057 Ok. I will try to rework this to the custom operator approach... About your last statement: We could decide to omit this information in the Storm compatibility layer (and drop this PR completely) -- of course, this would limits the "compatibility level" we can achieve. Btw: Storm does not give the "channel index" itself to the user, but the task-id of the parallel producer task attached to each input tuple (in Storm, task-ids are unique over all tasks over all operators of the topology). The compatibility will translate the "channel index" into this task-id (using the "channel index" is the only way I see to get this done in the compatibility layer). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-171785899 Right, you cannot secure users from writing bad code, of course. But a good way to guide then to writing good code is to expose only information they can actually use in the abstractions provided to them. Exposing certain information and declaring it as "don't use" is a bit like setting them up to make mistakes, and feels like such a strong sight that the abstractions are not picked correctly. If this information is needed, I would really take the custom operator variant. We could also think if there is a good way of circumventing to depend on that channel index information in the first place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170244363 I understand your skeptic, but I don't see it as dangerous as you. If somebody access this information and does not use it properly, it's his/her own fault... (same as using a static variable in an UDF -- we cannot "protect" users from writing bad code). Nevertheless, originally I had the same idea as you suggest (ie, use `processRecord()`). However, the compatibility layer also used two `CoFlatMapFunction`s which need to access this information, too. - https://github.com/apache/flink/blob/master/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/TwoFlinkStreamsMerger.java - https://github.com/apache/flink/blob/master/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/StormFlinkStreamMerger.java It would be possible to rewrite both functions as custom operators, but this makes the translation code for multiple inputs streams more complex. Thus, I would prefer to expose this information in `RuntimeContext`. We can also extend the JavaDoc to warn the user to use this information carefully... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170083592 I am big time skeptic when it comes to putting this into the `RuntimeContext`. If we really need to expose that beyond the `StreamInputProcessor`, it would probably be best to pass this into the operator, as part of the "processRecord()" method. Since StormBolts are implemented as custom operators, they should be able to pick it up from there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170073172 If you implement a Bolt you can retrieve this information from Storm. Thus, the compatibility layer should provide this information, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-170044112 The user code should not deal with what channel an element came from. It wires assumptions about the parallelism of the predecessor into the user code. Adjustments of the parallelism become impossible that way. Why does the storm compatibility need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-168748526 Needed for Storm compatibility. This information is made available in Storm to the user. Not sure why adding it to `RuntimeContext` breaks abstraction layer (it's only meta information)? Can you explain what you mean by that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-168620418 Can you give us a bit of context why this is needed? Also: The `RuntimeContext` object is what is exposed to user-defined functions. Exposing the number of the channel there breaks the abstraction layer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/1483#issuecomment-168464116 Travis failed due to know instable test... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/1483 [FLINK-1870] Reintroduce indexed reader functionality to streaming You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-1870-inputChannelIndex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1483.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1483 commit 5a8ef6cc8424dbf718e245a1e57845cc9b820059 Author: mjsax Date: 2015-12-30T11:40:06Z [FLINK-1870] Reintroduce indexed reader functionality to streaming --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---