[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-02-07 Thread mjsax
Github user mjsax closed the pull request at:

https://github.com/apache/flink/pull/1483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-29 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-177069344
  
I would like to keep this open, until 
[FLINK-2721](https://issues.apache.org/jira/browse/FLINK-2721) is merged. To 
make sure, we do not need this -- for this case, we can also close the 
according JIRA of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-29 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-176839873
  
@mjsax What is the status on this PR? Can it be closed since the feature in 
Storm will be implemented another way? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-15 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171944496
  
With embedded mode, I mean to use a spout/bolt within a regular Flink 
streaming program. For this, you could even have a "plain/raw" input stream 
(eg, `DataStream`) or a POJO. Both can be translated into StormTuple 
(which is only used for input tuples). Output tuples are of type `Values` and 
also translated back to `TupleX` type.

Extending the tuples with an additional field would work, too. But I did 
not like this, because is increases the data that need to be shipped over the 
network. For embedded mode, we would need to think of a nice "work around" for 
"plain/raw" and POJO input streams  -- but this should not be a problem. As 
there are no globally unique task-ids anyway in embedded mode, it might be even 
ok not to support this feature...

I personally do not see it as a big (very dangerous) problem in exposing 
"input channel index" to the operators. But if you are strictly against it, I 
will go with the "additional field" approach to provide this information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-15 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171931389
  
I think that the Storm compatibility layer is very important. I also think, 
however, that when making decisions about whether to enhance to core API to 
serve other APIs or making that other API jump through some hoops then the 
cleanliness of the core API should be favored if possible.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-15 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171930800
  
What do you mean by embedded usage? Also, isn't there the `StormTuple` that 
is used as output of some operations. Then, even if plain Tuples are used, they 
could have an additional field with the TaskId. I still think it could be made 
to work without putting potentially misleading/dangerous stuff into the general 
API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-15 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171928147
  
We do not have any message wrappers on the wire. Data is transfered as 
plain Flink `TupleX` types and are translated into Storm format before handing 
them over to the bolt. This allows for embedded usage of bolt! Thus, I do not 
want to change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-15 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171926900
  
Hi,
I would be strongly against exposing the input channel index in any way to 
operators/user functions. How the index of a channel maps to an upstream 
operator is internal to the network layer and dependent on several things, some 
of which are not even deterministic IMHO. Off the top of my head it should 
depend on the number of upstream tasks, the parallelism of these upstream tasks 
and the order in which the system choses to attach the channels.

I think your goal can be achieved by adding a task-id field in the message 
wrapper and setting this when emitting elements. Setting this for every 
processed element (if using the Storm combat layer or not) seems quite wasteful 
to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-14 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171790057
  
Ok. I will try to rework this to the custom operator approach...

About your last statement: We could decide to omit this information in the 
Storm compatibility layer (and drop this PR completely) -- of course, this 
would limits the "compatibility level" we can achieve.

Btw: Storm does not give the "channel index" itself to the user, but the 
task-id of the parallel producer task attached to each input tuple (in Storm, 
task-ids are unique over all tasks over all operators of the topology). The 
compatibility will translate the "channel index" into this task-id (using the 
"channel index" is the only way I see to get this done in the compatibility 
layer).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-171785899
  
Right, you cannot secure users from writing bad code, of course.

But a good way to guide then to writing good code is to expose only 
information they can actually use in the abstractions provided to them. 
Exposing certain information and declaring it as "don't use" is a bit like 
setting them up to make mistakes, and feels like such a strong sight that the 
abstractions are not picked correctly.

If this information is needed, I would really take the custom operator 
variant. We could also think if there is a good way of circumventing to depend 
on that channel index information in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-09 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170244363
  
I understand your skeptic, but I don't see it as dangerous as you. If 
somebody access this information and does not use it properly, it's his/her own 
fault... (same as using a static variable in an UDF -- we cannot "protect" 
users from writing bad code).

Nevertheless, originally I had the same idea as you suggest (ie, use 
`processRecord()`). However, the compatibility layer also used two 
`CoFlatMapFunction`s which need to access this information, too. 

 - 
https://github.com/apache/flink/blob/master/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/TwoFlinkStreamsMerger.java
 - 
https://github.com/apache/flink/blob/master/flink-contrib/flink-storm/src/main/java/org/apache/flink/storm/api/StormFlinkStreamMerger.java

It would be possible to rewrite both functions as custom operators, but 
this makes the translation code for multiple inputs streams more complex. Thus, 
I would prefer to expose this information in `RuntimeContext`. We can also 
extend the JavaDoc to warn the user to use this information carefully...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170083592
  
I am big time skeptic when it comes to putting this into the 
`RuntimeContext`.

If we really need to expose that beyond the `StreamInputProcessor`, it 
would probably be best to pass this into the operator, as part of the 
"processRecord()" method.

Since StormBolts are implemented as custom operators, they should be able 
to pick it up from there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170073172
  
If you implement a Bolt you can retrieve this information from Storm. Thus, 
the compatibility layer should provide this information, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-08 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-170044112
  
The user code should not deal with what channel an element came from. It 
wires assumptions about the parallelism of the predecessor into the user code. 
Adjustments of the parallelism become impossible that way.

Why does the storm compatibility need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-04 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-168748526
  
Needed for Storm compatibility. This information is made available in Storm 
to the user. Not sure why adding it to `RuntimeContext` breaks abstraction 
layer (it's only meta information)? Can you explain what you mean by that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-168620418
  
Can you give us a bit of context why this is needed?

Also: The `RuntimeContext` object is what is exposed to user-defined 
functions. Exposing the number of the channel there breaks the abstraction 
layer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-02 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/1483#issuecomment-168464116
  
Travis failed due to know instable test...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1870] Reintroduce indexed reader functi...

2016-01-02 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/flink/pull/1483

[FLINK-1870] Reintroduce indexed reader functionality to streaming



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/flink flink-1870-inputChannelIndex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1483.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1483


commit 5a8ef6cc8424dbf718e245a1e57845cc9b820059
Author: mjsax 
Date:   2015-12-30T11:40:06Z

[FLINK-1870] Reintroduce indexed reader functionality to streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---