Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Raghu Angadi
On Fri, Jul 29, 2016 at 12:56 PM, Dan Halperin wrote: > > BEAM-452 and https://github.com/apache/incubator-beam/pull/690 > > Raghu, do you see this cache necessary once that work is in? nope. I didn't realize the feature is close to be merged. Thanks!

Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Raghu Angadi
KafkaIO: we should simply cache the producer and reuse within a JVM. A simple approach I plan to implement is to use a cache that expires after a few minutes of inactivity. We could assign a unique id per sink which it is creates so that we would the right kafka producer configuration even when

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-29 Thread Jean-Baptiste Onofré
+1 It sounds very good. Regards JB On 07/27/2016 05:20 AM, Kenneth Knowles wrote: Hi everyone, I would like to offer a proposal for a much-requested feature in Beam: Stateful processing in a DoFn. Please check out and comment on the proposal at this URL: https://s.apache.org/beam-state

Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Jean-Baptiste Onofré
I prefer the "IO" approach as it provides the advanced feature leveraged by the Beam model. My $0.01 Regards JB On 07/29/2016 07:45 PM, Raghu Angadi wrote: It is the preferred pattern I think. Is your source bounded or unbounded (i.e. streaming)? If it is latter, your sink could even be

Proposal: Dynamic PIpelineOptions

2016-07-29 Thread Sam McVeety
During the graph construction phase, the given SDK generates an initial execution graph for the program. At execution time, this graph is executed, either locally or by a service. Currently, Beam only supports parameterization at graph construction time. Both Flink and Spark supply

Re: [PROPOSAL] State and Timers for DoFn (aka per-key workflows)

2016-07-29 Thread Aljoscha Krettek
+1 Very nice proposal and the API already looks very good. I guess the only thing people still like to discuss on this is naming of things. :-) I just have one general remark about giving users access to state and timers. The Beam model takes great care to mostly shield users from the reality of

Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Raghu Angadi
It is the preferred pattern I think. Is your source bounded or unbounded (i.e. streaming)? If it is latter, your sink could even be simpler than JB's. e.g. KafkaIO.write() where it just writes the messages to Kafka in processElement(). The pros are pretty clear : runner independent, pure Beam,

Re: Suggestion for Writing Sink Implementation

2016-07-29 Thread Chawla,Sumit
Any more comments on this pattern suggested by Jean? Regards Sumit Chawla On Thu, Jul 28, 2016 at 1:34 PM, Kenneth Knowles wrote: > What I said earlier is not quite accurate, though my advice is the same. > Here are the corrections: > > - The Write transform actually

Re: Podling Report Reminder - August 2016

2016-07-29 Thread James Malone
Hey JB, Good catch; my sincere apologies that wasn't on the list! At this point I propose we put the draft report with your addition on the wiki. Unless anyone disagrees can you please do that JB? I don;t have wiki edit access. Cheers! James On Thu, Jul 28, 2016 at 1:15 PM Jean-Baptiste Onofré