[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287241#comment-15287241 ] Robert Joseph Evans commented on STORM-1757: OK so that was not clear from the javadocs in state. I should have read through more of the documentation that you sent out. Thanks for putting up with some of my crazy questions. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287098#comment-15287098 ] Arun Mahadevan commented on STORM-1757: --- > So if I ask all bolts to checkpoint and most succeed but one does not, I can > not longer restore everyone to a consistent place, so I can start replaying > again. This is what I am concerned about. The checkpointing handle this via a prepare and a commit phase. - First a "prepare" message is send. If the prepare fails (most of the bolts succeeded but one did not), the checkpoint is restored to the last successful point (rollback) and tuples are replayed. - If prepare succeeds, a "commit" message is send. - If commit fails, the commit is re-attempted by sending the "commit" message again. The bolts that had already committed ignores this and the bolt that had previously not committed would now commit. They can do so because the prepared data is persisted in the state during prepare phase. - Once the commit succeeds the txn is marked as complete. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286933#comment-15286933 ] Robert Joseph Evans commented on STORM-1757: [~arunmahadevan], BEAM does exactly once, by doing at least once, but making sure that the checkpoints happen in an idempotent and coordinated way. The issue is not at least once vs exactly once, but is around failure cases and consistency. From what I have seen with the state API + windowing offers some coordination and restore, reprocess, re-checkpoint should be idempotent, but it can only role back to the last successful checkpoint. So if I ask all bolts to checkpoint and most succeed but one does not, I can not longer restore everyone to a consistent place, so I can start replaying again. This is what I am concerned about. I would like to be able to fully support the BEAM model, and then if we can offer better performance by reducing guarantees I am OK with that. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286896#comment-15286896 ] Arun Mahadevan commented on STORM-1757: --- The state transition doc tries to explain some of this. https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/spout/CheckPointState.java#L25 > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286881#comment-15286881 ] Arun Mahadevan commented on STORM-1757: --- [~revans2] the current checkpointing supports at-least once and may be we could use it as is for the initial prototype if we are ok with the guarantee. The current implementation does not checkpoint the state of the spout, but I think it can be extended to do so either by having the checkpoint spout co-ordinating with user spouts via zookeeper or by having the checkpoint spout act like a co-ordinator with user spouts running as bolts (similar to trident). I will put more thought around this and try to come up with a prototype. I am not sure I understand correctly the requirement for having a commit id in rollback. The current restore happens by discarding any prepared (but un-commited) changes or rolling forward any commits that were in progress. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286713#comment-15286713 ] Robert Joseph Evans commented on STORM-1757: [~satish.duggana], I'm not totally sure if this is the right place to ask this or not. If you want me to move it to STORM-1763 I ma happy to. How do we plan on doing a coordinated restore using the current [State|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/state/State.java] API? I supposed we can extend it so rollback optionally supports a commit id, and there is an API to release commits older than a give commit id that are not needed any more. Should I file a separate JIRA for something like that? Unless we want to start out with the simpler one micro batch at a time processing. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286704#comment-15286704 ] Robert Joseph Evans commented on STORM-1757: [~satish.duggana], That all sounds great. So what is really left is coordination between the sources and a translation layer, followed by lots and lots of optimizations/testing. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286663#comment-15286663 ] Satish Duggana commented on STORM-1757: --- [~arunmahadevan] These features are done in core to have parity with Beam windowing features. Beam runner implementation can reuse this code. Trident will be enhanced later once these are done. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286623#comment-15286623 ] Satish Duggana commented on STORM-1757: --- [~revans2] Currently, windowing support exists for processing time/event time with watermarks with a single trigger/eviction policy other than count based policies. I think we can enhance core windowing for supporting multiple active windows/panes with multiple triggers/custom eviction policies as mentioned in earlier [comment|https://issues.apache.org/jira/browse/STORM-1757?focusedCommentId=15282971=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15282971]. This should include checkpointing the panes. I plan to work next few days on this and add initial thoughts in STORM-1763. Windowing can take the applied functions like reducer/aggregator etc and make appropriate optimizations in maintaining the pane state. Panes with accumulating eviction policies can have optimizations for reducer/aggregator without the need to store all the tuples in the pane state but by storing only the result value with required context. For ex, Avg function needs to store only the avg value and no of elements, these are sufficient in computing the accumulated value for the next pane. We can explore ways to optimize wherever it is possible. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284881#comment-15284881 ] Robert Joseph Evans commented on STORM-1757: [~arunmahadevan], Great to hear that at least some of it can be reused. Will have to dig into it more to see exactly how much we can reuse. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284871#comment-15284871 ] Robert Joseph Evans commented on STORM-1757: I agree that going to a new API and deprecating other APIs is something that we need to do thoughtfully and is not going to happen overnight. Perhaps it warrants it's own JIRA for discussion/exploration. Having multiple APIs is a pain from both a maintenance standpoint and from a user standpoint in deciding what to do. You mean I have to rewrite things if I don't need exactly once any more and I want better performance? It would be nice to be able to support exactly once processing in a unified API with at least once and at most once. Preferably something that is backwards compatible with the existing storm core API. But I really haven't spent much time thinking through how that API would work/look. For the most part I would expect it could look like the check-pointing code we have now, but some de-duping happening in there as well. Either way I think this needs to be on a separate JIRA so we can discuss a long term plan for storm. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284808#comment-15284808 ] P. Taylor Goetz commented on STORM-1757: Storm was designed from the start to support multiple APIs/DSLs, that's why we have the Thrift interface. That's what Storm's Spout/Bolt API is built on, and Trident in turn builds on that. There are Clojure, Python, and F# APIs all built on top of the Thrift interface as well. While I think some sort of "unified" API would be nice, I don't see the Core or Trident APIs going away without causing serious pain to existing users -- while the Core API is arguably more popular, there are a lot of people that use Trident. A good example of how hard it can be to deprecate an API is transactional topologies, which were deprecated in favor of Trident. The problem was that users still wanted and used them, eliminating that API would just force users to reimplement that functionality themselves. The Java 8 Streams API probably doesn't really apply completely or map one-to-one with what's possible with Storm, so we wouldn't actually be implementing those interfaces, but rather provide an API that very closely resembles/approximates the Streams API -- which the Trident does, somewhat, today (e.g. map/flatMap). Anyway, this is tangential to the discussion around implementing a Beam runner and may warrant a separate JIRA or email thread if we want to continue the discussion. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284767#comment-15284767 ] Arun Mahadevan commented on STORM-1757: --- [~revans2] distributed checkpointing implemented on top of storm's acking mechanism is currently used to implement the Storm's Stateful bolts. It currently provides at-least once guarantee and I think it can be enhanced to address the checkpointing requirements. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284755#comment-15284755 ] Robert Joseph Evans commented on STORM-1757: Sorry about the long posts, but I think I am done and have most of what I have been thinking about this week out of my head. Any feedback/suggestions would be greatly appreciated. (I have not thought through merging windows and how the check-pointing would work just yet, but I think it then becomes metadata around the panes and deciding which set of panes to read instead) > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284752#comment-15284752 ] Robert Joseph Evans commented on STORM-1757: To make distributed check pointing work we need more coordination between the Sources and the bolts than ack/fail really offers. For this I would propose that we have each Unbounded Source run as a spout. All of the spouts would need to coordinate with each other so that the barriers can be emitted close to one another, reducing the amount of data that would need to be buffered. We would also need coordination for restoring checkpoints. For this I would propose that we use zookeeper for this. Simply because it is already available and we would not need to add anything new to base storm to support it. All of the spouts would elect a leader through zookeeper. The leader would then trigger all of the spouts to emit a barrier and checkpoint the spout metadata. Because we are going to potentially have multiple checkpoints outstanding at any point in time we will need to label all of the checkpoints. I would label them with two numbers. The first would be the bundle/batch number, the second would be the replay, or generation number. The bundle number would increment for each barrier emitted, but would role back on failure. The generation number would increment for any failure. This would allow downstream bolts to be able to restore a checkpoint just by seeing the bundle id. Spouts would have acking to know if a downstream tuple failed. If an item fails the spout would inform the leader through zookeeper of the bad batch. The leader would then inform all of the other spouts to restore and start again. Each spout would also have to inform the leader periodically when a batch is fully acked. Once all of the spouts inform the leader that all of the tuples are emitted, then the leader can inform the spouts that they can delete the old checkpoints. They should also inform the downstream bolts as part of the barrier so they can clean up their old checkpoints too. We can work out the exact details of how this will work later on. For the actual check pointing in the bolts only the GroupByKey transform would need to do anything, and for simplicity it would checkpoint each pane and key separately, so that the checkpoints are incremental, and so that we can support very large windows without too much difficulty. In general all of this seems totally doable, and actually not that difficult. My biggest concern by far is around efficiency in the check pointing, especially for large windows. The check pointing is something that we need to do, and in the common case should be thrown away. So we want to be sure that we optimize for throwing the data away. We can easily write something that can be backed by HBase or most any nosql store. But that is going to add a lot of iops and network load that I am not too thrilled about. But perhaps it does not really matter for an initial deployment. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284720#comment-15284720 ] Sriharsha Chintalapani commented on STORM-1757: --- [~revans2] can you answer my earlier question about having one API doing both per tuple and batching. With your proposed I can see we are adding more layering and more api but doesn't solve the problem that user faces today. When they come to Storm to develop their application it will be more confusing to see all these options. We can still have a storm-core API as a bottom layer without replacing anything but have Java8 api to-do micro-batching and per tuple. And all the high-level languages can integrate with java8 if some users require it. With this model there will be one api (java8) and if some domain specific languages needed we can still do so. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284682#comment-15284682 ] Robert Joseph Evans commented on STORM-1757: BEAM logically has a few base transforms, that we need to support. We can extend this in the future to support more if we need/want to. * *GroupByKey* - Groups by key and window * CreatePCollectionView * Read.Bounded - only for batch (which we probably can support, just not right now) * *Read.Unbounded* * *ParDo.Bound* * ParDo.BoundMulti * *FlattenPCollectionList* - a merge of PCollections (Streams) * *Window.Bound* The *Bold* ones are what I think we should concentrate on first, and then we can add in support for side inputs and outputs in a second phase, and finally add in support for bounded "batch" processing if we see a need to. Like I stated before fault tolerance in an UnboundedSource is built around a checkpoint restore model. While reading data we can ask the source for metadata that we can then use to restore the source back to a given point in time. Windowing in BEAM requires check pointing and is based around the concept of a pane (which is a part of a window) and triggers. The GroupByKey transform is special in that it not only groups by a key, but also groups by a window. To do this windowed data will not be emitted until a trigger fires to start the processing. And even when the trigger fires there is a choice to discard the panes that have already been processed or to retain them until the trigger indicates that it will never fire again. Triggers usually are based off of a watermark that comes from the source. All ParDos also have a concept around "bundles" that are batches of tuples being processed. The DoFns can be informed of the start and end of a bundle, and in non-optimized cases will be recreated for each bundle. When we combine all of this together, we need a way to checkpoint/restore all of the state in a consistent way across the entire topology, preferably coordinated with the bundles as well. We could do this one of two ways. The simplest approach would be to have a trident like coordinator that # Tells bolts to prepare for a new batch # wait for ack # Tells spouts to emit a new batch # wait for ack # Tells everyone to checkpoint # wait for ack On success repeat. On failure: # Tell everyone to roll back to last successful checkpoint # Wait for ack # Start again This would be a simple way to get things started, but it would be very slow because we cannot have more then one batch of tuples outstanding at any point in time, and even if we can combine a few steps together we will need at least two round trips through the topology to finish a batch. If we want to try and go faster it is going to require more of an exchange of metadata, and check-pointing similar to how flink or apex do it. https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html Perhaps this is something that we should think about once we have a prototype working with the first implementation. More on the distributed check pointing coming shortly... > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284636#comment-15284636 ] Robert Joseph Evans commented on STORM-1757: I have gone through the BEAM API and I think I understand it fairly well now. I have a plan on how we could implement this, but honestly it is a different enough model from trident, that I don't think the two will be able to reuse code, like I stated before. Additionally I have not dug into our windowing code to know how it might work with BEAM, so any feedback on how if it can be reused would be appreciated. Short term I would propose that we go ahead with a BEAM implementation that runs on storm, not trident. Longer term I believe we should move towards a three tiered API approach. This is what most compiler frameworks use, and I think makes a lot of since in this use case too. The lowest level is like assembly language. For storm this would be bolts, spouts, groupings, and possibly some other things around state, check-pointing and coordination. This is what the worker and most of the scheduler sees and executes. There would also be an intermediate representation that describes the logical operations being performed. These would be fairly simple logical operations link map, groupbykey, windowing, etc. An optimizer would take this as input and transform into an optimized solution. Eventually this would be a cost based optimizer that would also take metrics from the running code + hints to understand where data is actually flowing, how skewed is the data, etc to improve the plan over time. A "compiler" would then translate the optimized intermediate representation into the assembled topology. The scheduler could then place the code physically on the cluster it is running on. High level languages like SQL, BEAM, Trident, etc. would translate their operations into the intermediate representation and submit it to nimbus for execution. This would let us keep the regular storm API, but also be able to maintaqin a few "standard" high level languages and support multiple domain specific languages. BEAM design coming shortly. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284266#comment-15284266 ] Arun Mahadevan commented on STORM-1757: --- Multiple active windows what [~satish.duggana] is proposing is good to have to address feature parity with beam and processing of late events, but we should also be aware of the extra complexities it will add in terms of buffering and tracking multiple windows. Custom triggers, eviction will be useful with single active window as well and may be much more simpler to add first. We should also try to make the trident implementation robust in terms of basic event time processing, time windows and address follow on JIRAs. [~sriharsha] idea of unified api based on Java streams api sounds good and if it makes sense lets open a separate JIRA to work on it. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282988#comment-15282988 ] Sriharsha Chintalapani commented on STORM-1757: --- [~revans2] [~arunmahadevan] I agree its larger effort but just adding a BEAM runner at this point will only add one more API that one can use to write a topology. We shouldn't keep it delayed just because its larger effort. I am of the opinion that it will help adoption of the storm and makes it easier for the existing users and in general keep the framework in a better shape with a single API. If the BEAM api has gaps, lets look into Java 8 API if it can answer the common API needs. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282971#comment-15282971 ] Satish Duggana commented on STORM-1757: --- Great idea to have a single API for both core and trident but that may not be along the lines of Beam API. I feel Beam API is too verbose and it is very specific to their model. Agree with [~sriharsha] [~revans2] for pursuing on this. We should find out the gaps with respect to Beam and implement them with respective priority. Our current windowing supports a single active window and it internally supports a single trigger/eviction policy but it is currently not exposed. I raised [STORM-1763](https://issues.apache.org/jira/browse/STORM-1763) to enhance core windowing for supporting multiple active windows with multiple triggers and custom eviction policies. I plan to implement that on core, which can be used in trident later. I have gone through Beam windowing to get the basic understanding but I need to explore more. Initial set of features to be implemented w.r.t windowing to reduce the gap with Beam - multiple active windows based on watermarks - multiple custom triggers support for a window - custom eviction policy (with accumulation/discarding builtin policies) - sessioned windows (this can be implemented by merging active windows with in a session gap) I will be adding more details in STORM-1763. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282938#comment-15282938 ] Robert Joseph Evans commented on STORM-1757: Yes it will be a much larger effort than just doing a BEAM runner. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282475#comment-15282475 ] Arun Mahadevan commented on STORM-1757: --- Unified storm stream api is ideal but could be a bigger effort than doing a beam runner poc with existing storm trident apis or adding minimal extra apis to get started and to identify the gaps. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282079#comment-15282079 ] Hugo Louro commented on STORM-1757: --- If it is at all possible unify the APIs for Storm and Trident that would be ideal in my opinion. Ideally the user should have to implement the logic once, and choose how he would like to run it, either streaming (tuples, i.e. using Storm) or batching (Trident). > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281994#comment-15281994 ] Robert Joseph Evans commented on STORM-1757: I agree that having more than one official Storm API is a support/maintenance issue, and worse it is confusing for customers. But I also see storm as a base platform that others can build APIs/Languages on top of. If someone wants an X compatible API on top of storm we should encourage them, and possibly add in features to core storm to support them, but we don't necessarily pull them into our repo. I see beam and SQL in this context. If you don't like trident, or think we can replace it with some changes to the normal storm API then lets think about what the correct API should be and implement it. If beam really is the correct API and we don't think we can do better, then storm should look at becoming the reference implementation for open source beam, and we should deprecate both Trident and the core storm API. But I personally don't think it is. From what I have seen the beam API is designed very much around ETL. It does that very well. It can handle complex windowing for you and make the job of most of my customers rather simple. This is why I am exploring beam compatibility. But in the case of micro services like what kafka streams appears to be targeting, or in the case IOT, is beam the right API for that? It may grow up to be eventually, but right now it does not appear to be at all. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281907#comment-15281907 ] Robert Joseph Evans commented on STORM-1757: [~sriharsha], Sorry I misread you original comment. You don't want to merge the two you want BEAM to replace trident. That seems possible, but it is definitely not a 1 to 1 translation, and Honestly I find the BEAM API much more difficult to get started on than I do spark, flink, or even the tirdent APIs. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281905#comment-15281905 ] Robert Joseph Evans commented on STORM-1757: I would love to have something like that happen, but I am not sure if the two are even compatible at a fundamental level. I think the Source/Spout APIs are a good example of this. https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/spout/ITridentSpout.java Provides two distinct things a BatchCoordinator and an Emitter. https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java Really just provides a single thing, an UnboundedReader. Both have a bit of metadata that is intended to act as a checkpoint of where they are currently at in processing (X in Trident and CheckpointMark in BEAM), but what they represent is fundamentally different. In Trident X represents a complete batch of data for processing. A start point through to an end point. In BEAM a CheckpointMark is just a place in the processing. It is a spot where a checkpoint can be restored to. There is no end to it, there is no batch. I can turn it into a batch if I stop reading after some amount of time, or after a given amount of data, but then I am just at a new point. State/Sinks are another. In Trident a State https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/state/State.java is a specific thing that can have data written to it and in some cases queried from it. But ultimately it really just comes down to two APIs begineTransaction and commit transaction. Transactions to state are guaranteed to be committed in order. Beam has not such concept the closest logically to it is a Sink, but a Sink is just for batch, in streaming you just use an idempotent ParDo. But in reality every ParDo offers similar lifecycle operations to a State. https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java A DoFn runs inside a ParDo, and has APIs for startBundle and finishBundle. A bundle is an arbitrary batch of data being processed, it seems that in streaming it is intended to be used around check-pointing, but there is no id like with state, so there actually is no guarantee of the order of processing. In fact it is quite the opposite they want to be sure that ParDos can even run in parallel on different machines. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281805#comment-15281805 ] Sriharsha Chintalapani commented on STORM-1757: --- one of the main things that I am interested as part of this work is to unifying the APIs. I can start another discussion around it but here are my early thoughts. Instead of having two APIs 1. Storm core 2. Trident we can use beam to unify this i.e Use the same api to do the per tuple and batching instead of asking users to write two different sets code if they want to do core or trident. Its up to the user's choice if they prefer throughput or latency. We can keep the core api and deprecate the trident for next versions. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281795#comment-15281795 ] Robert Joseph Evans commented on STORM-1757: I have a customer that is potentially interested in this too, but I need to come up with a plan/LOE to give them some advice. For the most part it seems like we should be able to build something fairly quickly, but like you said things don't seem too stable over on the BEAM side. I am going to probe a bit more on the mailing lists to try and understand some of the details in Windowing/Triggering, and side data. There are a few primitive operations that we need to implement a translation/runner for. GroupByKey - Groups by key and window CreatePCollectionView - for side input, not sure how/if this works with streaming Read.Bounded - only for batch (which we probably can support) Read.Unbounded ParDo.Bound ParDo.BoundMulti FlattenPCollectionList Window.Bound I am still working things out, but the main difference between what we have done so far and what we have right now with trident windowing/checkpointing is around the triggers and the watermarks. I don't know the internals of the windowing/checkpointing work in storm well enough to really say that if we had a trident implementation it would work, but I don't think so. I still have a bit more I need to understand before I can come up with a final design. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281767#comment-15281767 ] P. Taylor Goetz commented on STORM-1757: [~revans2], I'm in the same boat wrt the Beam API. It still seems to be unstable at the moment (i.e. no release). {quote} Has any work gone into designing how we would plan to support this? {quote} That's what I'd like to work out here (with this JIRA). There seems to be a lot of interest from multiple committers, so rather than have each develop an implementation in a vacuum it seemed better to start a conversation and collaborate as a group -- essentially to gather thoughts/proposals and go from there. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276656#comment-15276656 ] Robert Joseph Evans commented on STORM-1757: [~ptgoetz] and others, I am fine no matter where we put it for initial development. Putting on my Yahoo hat for a bit. We are interested in helping out on this too, but we have a lot of other work/priorities that we are trying to plan out, at least at a high level. Has any work gone into designing how we would plan to support this? Is it a layer on top of trident with the windowing support added in? I have not had the time to dig into the beam API that fully so I am not really sure what other gotchas might crop up. I really would like to get a handle on this so I can plan accordingly. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269904#comment-15269904 ] P. Taylor Goetz commented on STORM-1757: My initial thoughts: I asked on the beam ml early on where this should be developed, and there was a lot of support for doing so at beam. The more I think about, the more I lean toward doing it here, with the possibility that it might someday be transferred. We just have a lot more collective knowledge of Storm, and are probably better suited to collaborate on an implementation. My other thought is related to timing. It seems prudent to wait until the beam API has settled and there is a release to target, but I'm not opposed to getting started now. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269888#comment-15269888 ] Boyang Jerry Peng commented on STORM-1757: -- I think this would be a worth while project. The google dataflow API is very well thought out. We could potentially increase the user base of Storm by doing this. There is also demand within Yahoo for this feature > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
Never mind my prev comment… got confused by looking at BEAM-216. -roshan On 5/3/16, 4:55 PM, "Roshan Naik"wrote: >Shouldn¹t we have a BEAM runner in Storm project instead of Storm runner >in Beam project ? >-roshan > >On 5/3/16, 1:15 PM, "Hugo Louro (JIRA)" wrote: > >> >>[ >>https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira. >>p >>lugin.system.issuetabpanels:comment-tabpanel=15269498#co >>m >>ment-15269498 ] >> >>Hugo Louro commented on STORM-1757: >>--- >> >>I also would like to work on this project. >> >>> Apache Beam Runner for Storm >>> >>> >>> Key: STORM-1757 >>> URL: https://issues.apache.org/jira/browse/STORM-1757 >>> Project: Apache Storm >>> Issue Type: Brainstorming >>>Reporter: P. Taylor Goetz >>>Priority: Minor >>> >>> This is a call for interested parties to collaborate on an Apache Beam >>>[1] runner for Storm, and express their thoughts and opinions. >>> Given the addition of the Windowing API to Apache Storm, we should be >>>able to map naturally to the Beam API. If not, it may be indicative of >>>shortcomings of the Storm API that should be addressed. >>> [1] http://beam.incubator.apache.org >> >> >> >>-- >>This message was sent by Atlassian JIRA >>(v6.3.4#6332) >> >
Re: [jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
Shouldn¹t we have a BEAM runner in Storm project instead of Storm runner in Beam project ? -roshan On 5/3/16, 1:15 PM, "Hugo Louro (JIRA)"wrote: > >[ >https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.p >lugin.system.issuetabpanels:comment-tabpanel=15269498#com >ment-15269498 ] > >Hugo Louro commented on STORM-1757: >--- > >I also would like to work on this project. > >> Apache Beam Runner for Storm >> >> >> Key: STORM-1757 >> URL: https://issues.apache.org/jira/browse/STORM-1757 >> Project: Apache Storm >> Issue Type: Brainstorming >>Reporter: P. Taylor Goetz >>Priority: Minor >> >> This is a call for interested parties to collaborate on an Apache Beam >>[1] runner for Storm, and express their thoughts and opinions. >> Given the addition of the Windowing API to Apache Storm, we should be >>able to map naturally to the Beam API. If not, it may be indicative of >>shortcomings of the Storm API that should be addressed. >> [1] http://beam.incubator.apache.org > > > >-- >This message was sent by Atlassian JIRA >(v6.3.4#6332) >
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269498#comment-15269498 ] Hugo Louro commented on STORM-1757: --- I also would like to work on this project. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268166#comment-15268166 ] Jungtaek Lim commented on STORM-1757: - I also would like to participate in. Before that, how about discussing at dev@ to make sure potential contributors can also collaborate as well? I guess this could be a great item for GSoC if we release 1.0.0 before starting GSoC, but it's not late for potential contributors. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268155#comment-15268155 ] Arun Mahadevan commented on STORM-1757: --- Would like to collaborate in this effort. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm
[ https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267952#comment-15267952 ] Sriharsha Chintalapani commented on STORM-1757: --- I am working on it here https://issues.apache.org/jira/browse/BEAM-216. Will share the design docs. > Apache Beam Runner for Storm > > > Key: STORM-1757 > URL: https://issues.apache.org/jira/browse/STORM-1757 > Project: Apache Storm > Issue Type: Brainstorming >Reporter: P. Taylor Goetz >Assignee: Sriharsha Chintalapani >Priority: Minor > > This is a call for interested parties to collaborate on an Apache Beam [1] > runner for Storm, and express their thoughts and opinions. > Given the addition of the Windowing API to Apache Storm, we should be able to > map naturally to the Beam API. If not, it may be indicative of shortcomings > of the Storm API that should be addressed. > [1] http://beam.incubator.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332)