[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287241#comment-15287241
 ] 

Robert Joseph Evans commented on STORM-1757:


OK so that was not clear from the javadocs in state. I should have read through 
more of the documentation that you sent out.  Thanks for putting up with some 
of my crazy questions.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287098#comment-15287098
 ] 

Arun Mahadevan commented on STORM-1757:
---

> So if I ask all bolts to checkpoint and most succeed but one does not, I can 
> not longer restore everyone to a consistent place, so I can start replaying 
> again. This is what I am concerned about.

The checkpointing handle this via a prepare and a commit phase. 
- First a "prepare" message is send. If the prepare fails (most of the bolts 
succeeded but one did not), the checkpoint is restored to the last successful 
point (rollback) and tuples are replayed.
- If prepare succeeds, a "commit" message is send.
- If commit fails, the commit is re-attempted by sending the "commit" message 
again. The bolts that had already committed ignores this and the bolt that had 
previously not committed would now commit. They can do so because the prepared 
data is persisted in the state during prepare phase.
- Once the commit succeeds the txn is marked as complete. 

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286933#comment-15286933
 ] 

Robert Joseph Evans commented on STORM-1757:


[~arunmahadevan],

BEAM does exactly once, by doing at least once, but making sure that the 
checkpoints happen in an idempotent and coordinated way.  The issue is not at 
least once vs exactly once, but is around failure cases and consistency.  From 
what I have seen with the state API + windowing offers some coordination and 
restore, reprocess, re-checkpoint should be idempotent, but it can only role 
back to the last successful checkpoint.  So if I ask all bolts to checkpoint 
and most succeed but one does not, I can not longer restore everyone to a 
consistent place, so I can start replaying again.  This is what I am concerned 
about.  I would like to be able to fully support the BEAM model, and then if we 
can offer better performance by reducing guarantees I am OK with that.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286896#comment-15286896
 ] 

Arun Mahadevan commented on STORM-1757:
---

The state transition doc tries to explain some of this.

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/spout/CheckPointState.java#L25


> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286881#comment-15286881
 ] 

Arun Mahadevan commented on STORM-1757:
---

[~revans2] the current checkpointing supports at-least once and may be we could 
use it as is for the initial prototype if we are ok with the guarantee. The 
current implementation does not checkpoint the state of the spout, but I think 
it can be extended to do so either by having the checkpoint spout co-ordinating 
with user spouts via zookeeper or by having  the checkpoint spout act like a 
co-ordinator with user spouts running as bolts (similar to trident). I will put 
more thought around this and try to come up with a prototype. 

I am not sure I understand correctly the requirement for having a commit id in 
rollback. The current restore happens by discarding any prepared (but 
un-commited) changes or rolling forward any commits that were in progress.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286713#comment-15286713
 ] 

Robert Joseph Evans commented on STORM-1757:


[~satish.duggana],

I'm not totally sure if this is the right place to ask this or not.  If you 
want me to move it to STORM-1763 I ma happy to.  How do we plan on doing a 
coordinated restore using the current 
[State|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/state/State.java]
 API?  I supposed we can extend it so rollback optionally supports a commit id, 
and there is an API to release commits older than a give commit id that are not 
needed any more.  Should I file a separate JIRA for something like that?  
Unless we want to start out with the simpler one micro batch at a time 
processing.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286704#comment-15286704
 ] 

Robert Joseph Evans commented on STORM-1757:


[~satish.duggana],

That all sounds great.   So what is really left is coordination between the 
sources and a translation layer, followed by lots and lots of 
optimizations/testing.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Satish Duggana (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286663#comment-15286663
 ] 

Satish Duggana commented on STORM-1757:
---

[~arunmahadevan] These features are done in core to have parity with Beam 
windowing features. Beam runner  implementation can reuse this code.
Trident will be enhanced later once these are done.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-17 Thread Satish Duggana (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286623#comment-15286623
 ] 

Satish Duggana commented on STORM-1757:
---

[~revans2] Currently, windowing support exists for processing time/event time 
with watermarks with a single trigger/eviction policy other than count based 
policies.

I think we can enhance core windowing for supporting multiple active 
windows/panes with multiple triggers/custom eviction policies as mentioned in 
earlier 
[comment|https://issues.apache.org/jira/browse/STORM-1757?focusedCommentId=15282971=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15282971].
 This should include checkpointing the panes. I plan to work next few days on 
this and add initial thoughts in STORM-1763.

Windowing can take the applied functions like reducer/aggregator etc and make 
appropriate optimizations in maintaining the pane state. Panes with 
accumulating eviction policies can have optimizations for reducer/aggregator 
without the need to store all the tuples in the pane state but by storing only 
the result value with required context. For ex, Avg function needs to store 
only the avg value and no of elements, these are sufficient in computing the 
accumulated value for the next pane. We can explore ways to optimize wherever 
it is possible.



> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284881#comment-15284881
 ] 

Robert Joseph Evans commented on STORM-1757:


[~arunmahadevan],

Great to hear that at least some of it can be reused.  Will have to dig into it 
more to see exactly how much we can reuse.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284871#comment-15284871
 ] 

Robert Joseph Evans commented on STORM-1757:


I agree that going to a new API and deprecating other APIs is something that we 
need to do thoughtfully and is not going to happen overnight.  Perhaps it 
warrants it's own JIRA for discussion/exploration.  

Having multiple APIs is a pain from both a maintenance standpoint and from a 
user standpoint in deciding what to do.  You mean I have to rewrite things if I 
don't need exactly once any more and I want better performance? It would be 
nice to be able to support exactly once processing in a unified API with at 
least once and at most once.  Preferably something that is backwards compatible 
with the existing storm core API.  But I really haven't spent much time 
thinking through how that API would work/look.  For the most part I would 
expect it could look like the check-pointing code we have now, but some 
de-duping happening in there as well.  Either way I think this needs to be on a 
separate JIRA so we can discuss a long term plan for storm.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284808#comment-15284808
 ] 

P. Taylor Goetz commented on STORM-1757:


Storm was designed from the start to support multiple APIs/DSLs, that's why we 
have the Thrift interface. That's what Storm's Spout/Bolt API is built on, and 
Trident in turn builds on that. There are Clojure, Python, and F# APIs all 
built on top of the Thrift interface as well.

While I think some sort of "unified" API would be nice, I don't see the Core or 
Trident APIs going away without causing serious pain to existing users -- while 
the Core API is arguably more popular, there are a lot of people that use 
Trident. A good example of how hard it can be to deprecate an API is 
transactional topologies, which were deprecated in favor of Trident. The 
problem was that users still wanted and used them, eliminating that API would 
just force users to reimplement that functionality themselves.

The Java 8 Streams API probably doesn't really apply completely or map 
one-to-one with what's possible with Storm, so we wouldn't actually be 
implementing those interfaces, but rather provide an API that very closely 
resembles/approximates the Streams API -- which the Trident does, somewhat, 
today (e.g. map/flatMap).

Anyway, this is tangential to the discussion around implementing a Beam runner 
and may warrant a separate JIRA or email thread if we want to continue the 
discussion.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284767#comment-15284767
 ] 

Arun Mahadevan commented on STORM-1757:
---

[~revans2] distributed checkpointing implemented on top of storm's acking 
mechanism is currently used to implement the Storm's Stateful bolts. It 
currently provides at-least once guarantee and I think it can be enhanced to 
address the checkpointing requirements.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284755#comment-15284755
 ] 

Robert Joseph Evans commented on STORM-1757:


Sorry about the long posts, but I think I am done and have most of what I have 
been thinking about this week out of my head.  Any feedback/suggestions would 
be greatly appreciated.  (I have not thought through merging windows and how 
the check-pointing would work just yet, but I think it then becomes metadata 
around the panes and deciding which set of panes to read instead)

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284752#comment-15284752
 ] 

Robert Joseph Evans commented on STORM-1757:


To make distributed check pointing work we need more coordination between the 
Sources and the bolts than ack/fail really offers.  For this I would propose 
that we have each Unbounded Source run as a spout.  All of the spouts would 
need to coordinate with each other so that the barriers can be emitted close to 
one another, reducing the amount of data that would need to be buffered.  We 
would also need coordination for restoring checkpoints.  For this I would 
propose that we use zookeeper for this.  Simply because it is already available 
and we would not need to add anything new to base storm to support it.

All of the spouts would elect a leader through zookeeper.  The leader would 
then trigger all of the spouts to emit a barrier and checkpoint the spout 
metadata.  Because we are going to potentially have multiple checkpoints 
outstanding at any point in time we will need to label all of the checkpoints.  
I would label them with two numbers.  The first would be the bundle/batch 
number, the second would be the replay, or generation number.  The bundle 
number would increment for each barrier emitted, but would role back on 
failure.  The generation number would increment for any failure.  This would 
allow downstream bolts to be able to restore a checkpoint just by seeing the 
bundle id.

Spouts would have acking to know if a downstream tuple failed.  If an item 
fails the spout would inform the leader through zookeeper of the bad batch.  
The leader would then inform all of the other spouts to restore and start again.

Each spout would also have to inform the leader periodically when a batch is 
fully acked.  Once all of the spouts inform the leader that all of the tuples 
are emitted, then the leader can inform the spouts that they can delete the old 
checkpoints.  They should also inform the downstream bolts as part of the 
barrier so they can clean up their old checkpoints too.

We can work out the exact details of how this will work later on.

For the actual check pointing in the bolts only the GroupByKey transform would 
need to do anything, and for simplicity it would checkpoint each pane and key 
separately, so that the checkpoints are incremental, and so that we can support 
very large windows without too much difficulty.

In general all of this seems totally doable, and actually not that difficult.  
My biggest concern by far is around efficiency in the check pointing, 
especially for large windows.  The check pointing is something that we need to 
do, and in the common case should be thrown away.  So we want to be sure that 
we optimize for throwing the data away.  We can easily write something that can 
be backed by HBase or most any nosql store.  But that is going to add a lot of 
iops and network load that I am not too thrilled about.  But perhaps it does 
not really matter for an initial deployment.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284720#comment-15284720
 ] 

Sriharsha Chintalapani commented on STORM-1757:
---

[~revans2] can you answer my earlier question about having one API doing both 
per tuple and batching. With your proposed I can see we are adding more 
layering and more api but doesn't solve the problem that user faces today. When 
they come to Storm to develop their application it will be more confusing to 
see all these options. 
We can still have a storm-core API as a bottom layer without replacing anything 
but have Java8 api to-do micro-batching and per tuple. 
And all the high-level languages can integrate with java8 if some users require 
it. With this model there will be one api (java8) and if some domain specific 
languages needed we can still do so.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284682#comment-15284682
 ] 

Robert Joseph Evans commented on STORM-1757:


BEAM logically has a few base transforms, that we need to support.  We can 
extend this in the future to support more if we need/want to.

* *GroupByKey* - Groups by key and window
* CreatePCollectionView
* Read.Bounded - only for batch (which we probably can support, just not right 
now)
* *Read.Unbounded*
* *ParDo.Bound*
* ParDo.BoundMulti
* *FlattenPCollectionList* - a merge of PCollections (Streams)
* *Window.Bound*

The *Bold* ones are what I think we should concentrate on first, and then we 
can add in support for side inputs and outputs in a second phase, and finally 
add in support for bounded "batch" processing if we see a need to.

Like I stated before fault tolerance in an UnboundedSource is built around a 
checkpoint restore model.  While reading data we can ask the source for 
metadata that we can then use to restore the source back to a given point in 
time.

Windowing in BEAM requires check pointing and is based around the concept of a 
pane (which is a part of a window) and triggers.  The GroupByKey transform is 
special in that it not only groups by a key, but also groups by a window.  To 
do this windowed data will not be emitted until a trigger fires to start the 
processing.  And even when the trigger fires there is a choice to discard the 
panes that have already been processed or to retain them until the trigger 
indicates that it will never fire again.  Triggers usually are based off of a 
watermark that comes from the source.

All ParDos also have a concept around "bundles" that are batches of tuples 
being processed.  The DoFns can be informed of the start and end of a bundle, 
and in non-optimized cases will be recreated for each bundle.   

When we combine all of this together, we need a way to checkpoint/restore all 
of the state in a consistent way across the entire topology, preferably 
coordinated with the bundles as well.  We could do this one of two ways.  The 
simplest approach would be to have a trident like coordinator that 

  # Tells bolts to prepare for a new batch
  # wait for ack
  # Tells spouts to emit a new batch
  # wait for ack
  # Tells everyone to checkpoint
  # wait for ack

On success repeat. 

On failure:
  # Tell everyone to roll back to last successful checkpoint
  # Wait for ack
  # Start again

This would be a simple way to get things started, but it would be very slow 
because we cannot have more then one batch of tuples outstanding at any point 
in time, and even if we can combine a few steps together we will need at least 
two round trips through the topology to finish a batch.

If we want to try and go faster it is going to require more of an exchange of 
metadata, and check-pointing similar to how flink or apex do it.

https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html

Perhaps this is something that we should think about once we have a prototype 
working with the first implementation.  More on the distributed check pointing 
coming shortly...

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284636#comment-15284636
 ] 

Robert Joseph Evans commented on STORM-1757:


I have gone through the BEAM API and I think I understand it fairly well now.  
I have a plan on how we could implement this, but honestly it is a different 
enough model from trident, that I don't think the two will be able to reuse 
code, like I stated before.  Additionally I have not dug into our windowing 
code to know how it might work with BEAM, so any feedback on how if it can be 
reused would be appreciated.

Short term I would propose that we go ahead with a BEAM implementation that 
runs on storm, not trident.  Longer term I believe we should move towards a 
three tiered API approach.  This is what most compiler frameworks use, and I 
think makes a lot of since in this use case too.  

The lowest level is like assembly language.  For storm this would be bolts, 
spouts, groupings, and possibly some other things around state, check-pointing 
and coordination.  This is what the worker and most of the scheduler sees and 
executes.

There would also be an intermediate representation that describes the logical 
operations being performed.  These would be fairly simple logical operations 
link map, groupbykey, windowing, etc. An optimizer would take this as input and 
transform into an optimized solution.  Eventually this would be a cost based 
optimizer that would also take metrics from the running code + hints to 
understand where data is actually flowing, how skewed is the data, etc to 
improve the plan over time.  A "compiler" would then translate the optimized 
intermediate representation into the assembled topology.  The scheduler could 
then place the code physically on the cluster it is running on.

High level languages like SQL, BEAM, Trident, etc. would translate their 
operations into the intermediate representation and submit it to nimbus for 
execution.  This would let us keep the regular storm API, but also be able to 
maintaqin a few "standard" high level languages and support multiple domain 
specific languages.

BEAM design coming shortly.



> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-16 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15284266#comment-15284266
 ] 

Arun Mahadevan commented on STORM-1757:
---

Multiple active windows what [~satish.duggana] is proposing is good to have to 
address feature parity with beam and processing of late events, but we should 
also be aware of the extra complexities it will add in terms of buffering and 
tracking multiple windows. Custom triggers, eviction will be useful with single 
active window as well and may be much more simpler to add first. We should also 
try to make the trident implementation robust in terms of basic event time 
processing, time windows and address follow on JIRAs.

[~sriharsha] idea of unified api based on Java streams api sounds good and if 
it makes sense lets open a separate JIRA to work on it.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-13 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282988#comment-15282988
 ] 

Sriharsha Chintalapani commented on STORM-1757:
---

[~revans2] [~arunmahadevan]  I agree its larger effort but just adding a BEAM 
runner at this point will only add one more API that one can use to write a 
topology. We shouldn't keep it delayed just because its larger effort. I am of 
the opinion that it will  help adoption of the storm and makes it easier for 
the existing users and in general keep the framework in a better shape with a 
single API. If the BEAM api has gaps, lets look into Java 8 API if it can 
answer the common API needs.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-13 Thread Satish Duggana (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282971#comment-15282971
 ] 

Satish Duggana commented on STORM-1757:
---

Great idea to have a single API for both core and trident but that may not be 
along the lines of Beam API. I feel Beam API is too verbose and it is very 
specific to their model. Agree with [~sriharsha]  [~revans2] for pursuing on 
this.

We should find out the gaps with respect to Beam and implement them with 
respective priority.

Our current windowing supports a single active window and it internally 
supports a single trigger/eviction policy but it is currently not exposed. I 
raised [STORM-1763](https://issues.apache.org/jira/browse/STORM-1763) to 
enhance core windowing for supporting multiple active windows with multiple 
triggers and custom eviction policies. I plan to implement that on core, which 
can be used in trident later. I have gone through Beam windowing to get the 
basic understanding but I need to explore more.

Initial set of features to be implemented w.r.t windowing to reduce the gap 
with Beam
- multiple active windows based on watermarks
- multiple custom triggers support for a window
- custom eviction policy (with accumulation/discarding builtin policies)
- sessioned windows (this can be implemented by merging active windows with in 
a session gap)

I will be adding more details in STORM-1763.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-13 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282938#comment-15282938
 ] 

Robert Joseph Evans commented on STORM-1757:


Yes it will be a much larger effort than just doing a BEAM runner.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-13 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282475#comment-15282475
 ] 

Arun Mahadevan commented on STORM-1757:
---

Unified storm stream api is ideal but could be a bigger effort than doing a 
beam runner poc with existing storm trident apis or adding minimal extra apis 
to get started and to identify the gaps.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282079#comment-15282079
 ] 

Hugo Louro commented on STORM-1757:
---

If it is at all possible unify the APIs for Storm and Trident that would be 
ideal in my opinion. Ideally the user should have to implement the logic once, 
and choose how he would like to run it, either streaming (tuples, i.e. using 
Storm) or batching (Trident).

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281994#comment-15281994
 ] 

Robert Joseph Evans commented on STORM-1757:


I agree that having more than one official Storm API is a support/maintenance 
issue, and worse it is confusing for customers.  But I also see storm as a base 
platform that others can build APIs/Languages on top of.  If someone wants an X 
compatible API on top of storm we should encourage them, and possibly add in 
features to core storm to support them, but we don't necessarily pull them into 
our repo.  I see beam and SQL in this context.

If you don't like trident, or think we can replace it with some changes to the 
normal storm API then lets think about what the correct API should be and 
implement it.  If beam really is the correct API and we don't think we can do 
better, then storm should look at becoming the reference implementation for 
open source beam, and we should deprecate both Trident and the core storm API.

But I personally don't think it is.  From what I have seen the beam API is 
designed very much around ETL.  It does that very well.  It can handle complex 
windowing for you and make the job of most of my customers rather simple.  This 
is why I am exploring beam compatibility.  But in the case of micro services 
like what kafka streams appears to be targeting, or in the case IOT, is beam 
the right API for that?  It may grow up to be eventually, but right now it does 
not appear to be at all.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281907#comment-15281907
 ] 

Robert Joseph Evans commented on STORM-1757:


[~sriharsha],

Sorry I misread you original comment.  You don't want to merge the two you want 
BEAM to replace trident.  That seems possible, but it is definitely not a 1 to 
1 translation, and Honestly I find the BEAM API much more difficult to get 
started on than I do spark, flink, or even the tirdent APIs.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281905#comment-15281905
 ] 

Robert Joseph Evans commented on STORM-1757:


I would love to have something like that happen, but I am not sure if the two 
are even compatible at a fundamental level.  I think the Source/Spout APIs are 
a good example of this.

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/spout/ITridentSpout.java

Provides two distinct things a BatchCoordinator and an Emitter.

https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java

Really just provides a single thing, an UnboundedReader.

Both have a bit of metadata that is intended to act as a checkpoint of where 
they are currently at in processing (X in Trident and CheckpointMark in BEAM), 
but what they represent is fundamentally different.  In Trident X represents a 
complete batch of data for processing.  A start point through to an end point.  
In BEAM a CheckpointMark is just a place in the processing.  It is a spot where 
a checkpoint can be restored to.  There is no end to it, there is no batch.  I 
can turn it into a batch if I stop reading after some amount of time, or after 
a given amount of data, but then I am just at a new point.

State/Sinks are another.  In Trident a State 

https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/trident/state/State.java

is a specific thing that can have data written to it and in some cases queried 
from it.  But ultimately it really just comes down to two APIs 
begineTransaction and commit transaction.  Transactions to state are guaranteed 
to be committed in order.

Beam has not such concept the closest logically to it is a Sink, but a Sink is 
just for batch, in streaming you just use an idempotent ParDo. But in reality 
every ParDo offers similar lifecycle operations to a State.

https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/DoFn.java

A DoFn runs inside a ParDo, and has APIs for startBundle and finishBundle.  A 
bundle is an arbitrary batch of data being processed, it seems that in 
streaming it is intended to be used around check-pointing, but there is no id 
like with state, so there actually is no guarantee of the order of processing.  
In fact it is quite the opposite they want to be sure that ParDos can even run 
in parallel on different machines.




> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281805#comment-15281805
 ] 

Sriharsha Chintalapani commented on STORM-1757:
---

one of the main things that I am interested as part of this work is to unifying 
the APIs. I can start another discussion around it but here are my early 
thoughts. Instead of having two APIs 1. Storm core 2. Trident we can use beam 
to unify this i.e 
Use the same api to do the per tuple and batching instead of asking users to 
write two different sets code if they want to do core or trident. Its up to the 
user's choice if they prefer throughput or latency.
We can keep the core api and deprecate the trident for next versions.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281795#comment-15281795
 ] 

Robert Joseph Evans commented on STORM-1757:


I have a customer that is potentially interested in this too, but I need to 
come up with a plan/LOE to give them some advice.

For the most part it seems like we should be able to build something fairly 
quickly, but like you said things don't seem too stable over on the BEAM side.  
I am going to probe a bit more on the mailing lists to try and understand some 
of the details in Windowing/Triggering, and side data.

There are a few primitive operations that we need to implement a 
translation/runner for.

GroupByKey - Groups by key and window
CreatePCollectionView - for side input, not sure how/if this works with 
streaming
Read.Bounded - only for batch (which we probably can support)
Read.Unbounded
ParDo.Bound
ParDo.BoundMulti
FlattenPCollectionList
Window.Bound

I am still working things out, but the main difference between what we have 
done so far and what we have right now with trident windowing/checkpointing is 
around the triggers and the watermarks.  I don't know the internals of the 
windowing/checkpointing work in storm well enough to really say that if we had 
a trident implementation it would work, but I don't think so.  I still have a 
bit more I need to understand before I can come up with a final design.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-12 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281767#comment-15281767
 ] 

P. Taylor Goetz commented on STORM-1757:


[~revans2], I'm in the same boat wrt the Beam API. It still seems to be 
unstable at the moment (i.e. no release).

{quote}
Has any work gone into designing how we would plan to support this?
{quote}

That's what I'd like to work out here (with this JIRA). There seems to be a lot 
of interest from multiple committers, so rather than have each develop an 
implementation in a vacuum it seemed better to start a conversation and 
collaborate as a group -- essentially to gather thoughts/proposals and go from 
there. 

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276656#comment-15276656
 ] 

Robert Joseph Evans commented on STORM-1757:


[~ptgoetz] and others,

I am fine no matter where we put it for initial development.

Putting on my Yahoo hat for a bit.  We are interested in helping out on this 
too, but we have a lot of other work/priorities that we are trying to plan out, 
at least at a high level.  Has any work gone into designing how we would plan 
to support this?  Is it a layer on top of trident with the windowing support 
added in?  I have not had the time to dig into the beam API that fully so I am 
not really sure what other gotchas might crop up.  I really would like to get a 
handle on this so I can plan accordingly. 

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread P. Taylor Goetz (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269904#comment-15269904
 ] 

P. Taylor Goetz commented on STORM-1757:


My initial thoughts:

I asked on the beam ml early on where this should be developed, and there was a 
lot of support for doing so at beam.

The more I think about, the more I lean toward doing it here, with the 
possibility that it might someday be transferred. We just have a lot more 
collective knowledge of Storm, and are probably better suited to collaborate on 
an implementation.

My other thought is related to timing. It seems prudent to wait until the beam 
API has settled and there is a release to target, but I'm not opposed to 
getting started now.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread Boyang Jerry Peng (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269888#comment-15269888
 ] 

Boyang Jerry Peng commented on STORM-1757:
--

I think this would be a worth while project.  The google dataflow API  is very 
well thought out.  We could potentially increase the user base of Storm by 
doing this.  There is also demand within Yahoo for this feature

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread Roshan Naik
Never mind my prev comment… got confused by looking at BEAM-216.

-roshan


On 5/3/16, 4:55 PM, "Roshan Naik"  wrote:

>Shouldn¹t we have a BEAM runner in Storm project instead of  Storm runner
>in Beam project ?
>-roshan
>
>On 5/3/16, 1:15 PM, "Hugo Louro (JIRA)"  wrote:
>
>>
>>[ 
>>https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.
>>p
>>lugin.system.issuetabpanels:comment-tabpanel=15269498#co
>>m
>>ment-15269498 ] 
>>
>>Hugo Louro commented on STORM-1757:
>>---
>>
>>I also would like to work on this project.
>>
>>> Apache Beam Runner for Storm
>>> 
>>>
>>> Key: STORM-1757
>>> URL: https://issues.apache.org/jira/browse/STORM-1757
>>> Project: Apache Storm
>>>  Issue Type: Brainstorming
>>>Reporter: P. Taylor Goetz
>>>Priority: Minor
>>>
>>> This is a call for interested parties to collaborate on an Apache Beam
>>>[1] runner for Storm, and express their thoughts and opinions.
>>> Given the addition of the Windowing API to Apache Storm, we should be
>>>able to map naturally to the Beam API. If not, it may be indicative of
>>>shortcomings of the Storm API that should be addressed.
>>> [1] http://beam.incubator.apache.org
>>
>>
>>
>>--
>>This message was sent by Atlassian JIRA
>>(v6.3.4#6332)
>>
>



Re: [jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread Roshan Naik
Shouldn¹t we have a BEAM runner in Storm project instead of  Storm runner
in Beam project ?
-roshan

On 5/3/16, 1:15 PM, "Hugo Louro (JIRA)"  wrote:

>
>[ 
>https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.p
>lugin.system.issuetabpanels:comment-tabpanel=15269498#com
>ment-15269498 ] 
>
>Hugo Louro commented on STORM-1757:
>---
>
>I also would like to work on this project.
>
>> Apache Beam Runner for Storm
>> 
>>
>> Key: STORM-1757
>> URL: https://issues.apache.org/jira/browse/STORM-1757
>> Project: Apache Storm
>>  Issue Type: Brainstorming
>>Reporter: P. Taylor Goetz
>>Priority: Minor
>>
>> This is a call for interested parties to collaborate on an Apache Beam
>>[1] runner for Storm, and express their thoughts and opinions.
>> Given the addition of the Windowing API to Apache Storm, we should be
>>able to map naturally to the Beam API. If not, it may be indicative of
>>shortcomings of the Storm API that should be addressed.
>> [1] http://beam.incubator.apache.org
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)
>



[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread Hugo Louro (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269498#comment-15269498
 ] 

Hugo Louro commented on STORM-1757:
---

I also would like to work on this project.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268166#comment-15268166
 ] 

Jungtaek Lim commented on STORM-1757:
-

I also would like to participate in.
Before that, how about discussing at dev@ to make sure potential contributors 
can also collaborate as well?
I guess this could be a great item for GSoC if we release 1.0.0 before starting 
GSoC, but it's not late for potential contributors.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-02 Thread Arun Mahadevan (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268155#comment-15268155
 ] 

Arun Mahadevan commented on STORM-1757:
---

Would like to collaborate in this effort.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1757) Apache Beam Runner for Storm

2016-05-02 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267952#comment-15267952
 ] 

Sriharsha Chintalapani commented on STORM-1757:
---

I am working on it here https://issues.apache.org/jira/browse/BEAM-216. Will 
share the design docs.

> Apache Beam Runner for Storm
> 
>
> Key: STORM-1757
> URL: https://issues.apache.org/jira/browse/STORM-1757
> Project: Apache Storm
>  Issue Type: Brainstorming
>Reporter: P. Taylor Goetz
>Assignee: Sriharsha Chintalapani
>Priority: Minor
>
> This is a call for interested parties to collaborate on an Apache Beam [1] 
> runner for Storm, and express their thoughts and opinions.
> Given the addition of the Windowing API to Apache Storm, we should be able to 
> map naturally to the Beam API. If not, it may be indicative of shortcomings 
> of the Storm API that should be addressed.
> [1] http://beam.incubator.apache.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)