Re: BEAM-307(KafkaIO on Kafka 0.10)

2017-02-08 Thread Raghu Angadi
On Wed, Feb 8, 2017 at 12:19 PM, Jesse Anderson wrote: > I'm not. There was a decent amount of time between the first 0.8 and 0.9 > release. > The ones that affect are minor changes between 0.9 and 0.10 (e.g. change vararg to Collection<>). May be both could have existed

Re: BEAM-307(KafkaIO on Kafka 0.10)

2017-02-08 Thread Raghu Angadi
're 0.8, 0.9 and 0.10 for different teams. we'd take care of users > who are on old versions. > > > On Wed, Feb 8, 2017 at 10:56 AM, Raghu Angadi <rang...@google.com.invalid> > wrote: > > > If we let the user pick their kafka version in their dependencies, > simple

Re: BEAM-307(KafkaIO on Kafka 0.10)

2017-02-06 Thread Raghu Angadi
sed security service. In 0.9 the SASL-based > implementation is fixed with Kerberos. > Kafka client 0.10 cannot connect to Kafka server 0.9, that's why I mention > a separated project. > > Mingmin > > On Mon, Feb 6, 2017 at 11:45 AM, Raghu Angadi <rang...@google.com.invalid&g

Re: BEAM-307(KafkaIO on Kafka 0.10)

2017-02-06 Thread Raghu Angadi
Current KafkaIO works just fine with Kafka 0.10. I don't know of any incompatibilities or regressions. It does not take advantage of message timestamps, of course. It would be good to take handle tme in in a backward compatible way.. it might be required anyway if they are optional in 0.10. Not

Re: Metrics for Beam IOs.

2017-02-14 Thread Raghu Angadi
On Tue, Feb 14, 2017 at 9:21 AM, Ben Chambers wrote: > > > * I also think there are data source specific metrics that a given IO > will > > want to expose (ie, things like kafka backlog for a topic.) UnboundedSource has API for backlog. It is better for

Re: Merge HadoopInputFormatIO and HDFSIO in a single module

2017-02-15 Thread Raghu Angadi
I skimmed through HdfsIO and I think it is essentially HahdoopInpuFormatIO with FileInputFormat. I would pretty much move most of the code to HadoopInputFormatIO (just make HdfsIO a specific instance of HIF_IO). On Wed, Feb 15, 2017 at 9:15 AM, Dipti Kulkarni < dipti_dkulka...@persistent.com>

Exactly-once Kafka sink

2017-08-02 Thread Raghu Angadi
Kafka 0.11 added support for transactions[1], which allows end-to-end exactly-once semantics. Beam's KafkaIO users can benefit from these while using runners that support exactly-once processing. I have an implementation of EOS support for Kafka sink : https://github.com/apache/beam/pull/3612 It

Re: Exactly-once Kafka sink

2017-08-09 Thread Raghu Angadi
99fbcf160d45 > >> aa02d4559e/flink-streaming-java/src/main/java/org/apache/ > >> flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java#L55 > < > >> https://github.com/apache/flink/blob/62e99918a45b7215c099fbcf160d45 > >> aa02d4559e/flink-stream

Re: Exactly-once Kafka sink

2017-08-10 Thread Raghu Angadi
higher latency so we should only > do it if the DoFn signals that it needs deterministic input. > > > > +Jingsong Who is working on something similar for the output produced in > finishBundle(). > > > >> On 9. Aug 2017, at 19:41, Raghu Angadi <rang...@google.com.INVALID

Re: KafkaIO, Warning on offset gap

2017-06-28 Thread Raghu Angadi
Fixing it in https://github.com/apache/beam/pull/3461. Thanks for reporting the issue. On Wed, Jun 28, 2017 at 8:37 AM, Raghu Angadi <rang...@google.com> wrote: > Hi Elmar, > > You are right. We should not log this at all when the gaps are expected as > you pointed out. I don'

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-05 Thread Raghu Angadi
I would like to request merging two Kafka related PRs : #3461 , #3492 . Especially the second one, as it improves user experience in case of server misconfiguration that prevents connections between workers and the

Re: KafkaIO, Warning on offset gap

2017-06-28 Thread Raghu Angadi
Hi Elmar, You are right. We should not log this at all when the gaps are expected as you pointed out. I don't think client can check if compaction is enabled for a topic through Consumer api. I think we should remove the log. The user can't really act on it other than reporting it. I will send a

Re: Style of messages for checkArgument/checkNotNull in IOs

2017-07-28 Thread Raghu Angadi
On Fri, Jul 28, 2017 at 11:21 AM, Thomas Groh wrote: > I'm in favor of the wording in the style of the first: it's an immediately > actionable message that will have an associated stack trace, but will > provide the parameter in plaintext so the caller doesn't have to

Re: Exactly-once Kafka sink

2017-08-08 Thread Raghu Angadi
5 < > https://github.com/apache/flink/blob/62e99918a45b7215c099fbcf160d45 > aa02d4559e/flink-streaming-java/src/main/java/org/apache/ > flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java#L55> > [2] https://github.com/apache/flink/pull/4239 > > On 3. Aug 2017, at 04:03, Raghu Ang

Re: low availability in the coming 4 weeks

2017-05-25 Thread Raghu Angadi
Congrats Mingmin. All the best! On Wed, May 24, 2017 at 8:33 PM, Mingmin Xu wrote: > Hello everyone, > > I'll take 4 weeks off to take care of my new born baby. I'm very glad that > James Xu agrees to take my role in Beam SQL feature. > > Ps, I'll consolidate the PR for

Re: kafka docs

2017-08-29 Thread Raghu Angadi
Thanks Joey. I couldn't find the PR. Do you have a link? On Tue, Aug 29, 2017 at 1:48 AM, Joey Baruch wrote: > created pr on the beam site (just reference change for now, as Jean > Baptiste wrote) > > Thanks Aviem! > > On Tue, Aug 29, 2017 at 11:46 AM Aviem Zur

Re: Java pre/post commit test suite breakage

2017-10-23 Thread Raghu Angadi
Regd (1) : [4] did have have a file without Apache Licence. It was fixed the next day ( commit ), thanks to Ken Knowles who pinged me about it. On Mon, Oct 23, 2017 at 11:45 AM,

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Raghu Angadi
-1 for (a): no need to see all the private branch commits from contributor. It often makes me more conscious of local commits. +1 for (b): with committer replacing the squashed commit messages with '[BEAM-jira or PRID]: Brief cut-n-paste (or longer if it contributor provided one)'. -1 for (c):

Re: Guarding against unsafe triggers at construction time

2017-12-04 Thread Raghu Angadi
I have been thinking about this since last week's discussions about buffering in sinks and was reading https://s.apache.org/beam-sink-triggers. It says BEAM-3169 is an example of a bug caused by misunderstanding of trigger semantics. - I would like to know which part of the (documented) trigger

Re: makes bundle concept usable?

2017-11-17 Thread Raghu Angadi
ven > > >> >> > > >> >> On Fri, Nov 17, 2017 at 1:40 PM, Romain Manni-Bucau < > > >> rmannibu...@gmail.com> > > >> >> wrote: > > >> >> > > >> >>> Yes, what I propose earlier was

Re: makes bundle concept usable?

2017-11-17 Thread Raghu Angadi
n, no ? > >> >>>>> > >> >>>>> https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html > >> >>>>> > >> >>>>> Regards > >> >>>&g

Re: makes bundle concept usable?

2017-11-16 Thread Raghu Angadi
Core issue here is that there is no explicit concept of 'checkpoint' in Beam (UnboundedSource has a method 'getCheckpointMark' but that refers to the checkoint on external source). Runners do checkpoint internally as implementation detail. Flink's checkpoint model is entirely different from

Re: Is upgrading to Kafka Client 0.10.2.0+ in the roadmap?

2017-10-30 Thread Raghu Angadi
Shen, KafkaIO works with all the versions since 0.9. Just include kafka-clients version you like in your maven dependencies along with Beam dependencies. Glad to here Kafka 0.10.2 made it simpler to provide this config. On Mon, Oct 30, 2017 at 8:14 AM, Shen Li wrote: >

Re: Is upgrading to Kafka Client 0.10.2.0+ in the roadmap?

2017-10-30 Thread Raghu Angadi
> https://issues.apache.org/jira/browse/BEAM-307 This should be closed. On Mon, Oct 30, 2017 at 9:00 AM, Lukasz Cwik wrote: > There has been some discussion about getting Kafka 0.10.x working on > BEAM-307[1]. > > As an immediate way to unblock yourself, modify your

Re: Is upgrading to Kafka Client 0.10.2.0+ in the roadmap?

2017-10-30 Thread Raghu Angadi
sm", "IAF"); > > consumerPara.put("security.protocol", "SASL_PLAINTEXT"); > > consumerPara.put("sasl.login.class", "."); > > consumerPara.put("sasl.callback.handler.class", "...&

Re: Pubsub to Beam SQL

2018-05-04 Thread Raghu Angadi
On Thu, May 3, 2018 at 12:47 PM Anton Kedin wrote: > I think it makes sense for the case when timestamp is provided in the > payload (including pubsub message attributes). We can mark the field as an > event timestamp. But if the timestamp is internally defined by the source >

Re: What is the future of Reshuffle?

2018-05-19 Thread Raghu Angadi
On Sat, May 19, 2018 at 8:11 AM Robert Bradshaw <rober...@google.com> wrote: > On Fri, May 18, 2018 at 6:29 PM Raghu Angadi <rang...@google.com> wrote: > >> True. I am still failing to see what is broken about Reshuffle that is >> also not broken with GroupByKey

Re: What is the future of Reshuffle?

2018-05-20 Thread Raghu Angadi
On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw <rober...@google.com> wrote: > On Sat, May 19, 2018 at 6:27 PM Raghu Angadi <rang...@google.com> wrote: > >> [...] >> > I think it would be much more user friendly to un-deprecate it to add a >> warning f

Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
Filed https://issues.apache.org/jira/browse/BEAM-4372 (unassigned). On Mon, May 21, 2018 at 10:22 AM Raghu Angadi <rang...@google.com> wrote: > > > On Mon, May 21, 2018 at 9:56 AM Robert Bradshaw <rober...@google.com> > wrote: > >> We should probably keep the war

Re: What is the future of Reshuffle?

2018-05-21 Thread Raghu Angadi
id, any reorganization is much better than deprecation. Raghu. > > On Sun, May 20, 2018 at 6:38 PM Raghu Angadi <rang...@google.com> wrote: > >> >> >> On Sat, May 19, 2018 at 10:55 PM Robert Bradshaw <rober...@google.com> >> wrote: >

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
I am interested in more clarity on this as well. It has been deprecated for a long time without a replacement, and its usage has only grown, both within Beam code base as well as in user applications. If we are certain that it will not be removed before there is a good replacement for it, can we

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
.@google.com> > wrote: > >> Agreed that it should be undeprecated, many users are getting confused by >> this. >> I know that some people are working on a replacement for at least one of >> its use cases (RequiresStableInput), but the use case of breaking fusion &

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
is better than doing > nothing, and our use of URNs for this kind of thing is flexible enough that > we can deprecate old ones if/when we have time to pound out the right > solution. > > >> >>> Kenn >>> >>> On Fri, May 18, 2018 at 4:05 PM Raghu Angadi <r

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
> >> On Fri, May 18, 2018 at 7:50 AM Eugene Kirpichov <kirpic...@google.com> >> wrote: >> >>> Agreed that it should be undeprecated, many users are getting confused >>> by this. >>> I know that some people are working on a replacement for at lea

Re: What is the future of Reshuffle?

2018-05-18 Thread Raghu Angadi
n't provide that guarantee. That is what ExactlyOnceSink in KafkaIO does [1]. [1] https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L1049 > Kenn > > On Fri, May 18, 2018 at 4:05 PM Raghu Angadi <rang...@google.com>

Re: [FYI] New Apache Beam Swag Store!

2018-06-08 Thread Raghu Angadi
Woo-hoo! This is terrific. If we are increasing color choices I would like black or charcoal... Beam logo would really pop on a dark background. On Fri, Jun 8, 2018 at 3:32 PM Griselda Cuevas wrote: > Hi Beam Community, > > I just want to share with you the exciting news about our brand new

Re: [VOTE] Code Review Process

2018-06-04 Thread Raghu Angadi
+1 On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh wrote: > As we seem to largely have consensus in "Reducing Committer Load for Code > Reviews"[1], this is a vote to change the Beam policy on Code Reviews to > require that > > (1) At least one committer is involved with the code review, as either

Re: [DISCUSS] Automation for Java code formatting

2018-06-27 Thread Raghu Angadi
On Wed, Jun 27, 2018 at 10:13 AM Kenneth Knowles wrote: > Nope! No discretion allowed :-) > +1. Fair enough! > > On Wed, Jun 27, 2018 at 9:57 AM Raghu Angadi wrote: > >> +1. >> >> Wondering if it can be configured to reformat only what we care most

Re: [DISCUSS] Automation for Java code formatting

2018-06-27 Thread Raghu Angadi
+1. Wondering if it can be configured to reformat only what we care most about (2 space indentation etc), allowing some discretion on the edges. An example of inconsistent formatting that ends up in my code: --- anObject.someLongMethodName(arg_number_1,

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Raghu Angadi
On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath wrote: > Hi Aljoscha, > > I tried to cover this in the doc. Once we have full support for > cross-language IO, we can decide this on a case-by-case basis. But I don't > think we should cease defining new sources/sinks for

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-01-10 Thread Raghu Angadi
How often does your pipeline checkpoint/snapshot? If the failure happens before the first checkpoint, the pipeline could restart without any state, in which case KafkaIO would read from latest offset. There is probably some way to verify if pipeline is restarting from a checkpoint. On Sun, Jan 7,

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Raghu Angadi
; - the option exists only because we couldn't come up > with > >>> another way to implement idempotent rename on GCS. > >>> > >>> What's your idea of how a safe retryable GCS rename() could work? > >>> > >>> On Wed, Jan 31, 2018 a

Re: Filesystems.copy and .rename behavior

2018-02-02 Thread Raghu Angadi
to be retry tolerant while setting the transform. I think the current behavior of not overwriting the output would be very surprising to the unsuspecting users. > > On Fri, Feb 2, 2018 at 9:54 AM, Raghu Angadi <rang...@google.com> wrote: > >> In a batch pipeline, is it consid

RFC: Better event time and source watermark in KafkaIO

2018-02-01 Thread Raghu Angadi
Kafka supports server-side and client-side timestamps since version 0.10.1. KafkaIO in Beam can provide much better watermark, especially for topics with server-side timestamps. The default implementation currently just uses processing time for event time and watermark, which is not very useful.

Re: Filesystems.copy and .rename behavior

2018-01-31 Thread Raghu Angadi
Original mail mentions that output from second run of word_count is ignored. That does not seem as safe as ignoring error from a second attempt of a step. How do we know second run didn't run on different output? Overwriting seems more accurate than ignoring. Does handling this error at sink level

Re: Filesystems.copy and .rename behavior

2018-01-31 Thread Raghu Angadi
ing is more in line with user expectations. >> I believe that the sink should not ignore errors from the filesystem >> layer. Instead, the FileSystem API should be more well defined. >> Examples: rename() and copy() should overwrite existing files at the >> destination, c

Re: KafkaIO reading from latest offset when pipeline fails on FlinkRunner

2018-02-05 Thread Raghu Angadi
ffset after data is >>> read, and once job is restarted KafkaIO reads from last_committed_offset. >>> >>> In my jobs, I enable external(external should be optional I think?) >>> checkpoint on exactly-once mode in Flink cluster. When the job auto-restart &

Re: coder evolutions?

2018-02-05 Thread Raghu Angadi
Could you describe 2nd issue bit more in detail may be with a short example? LengthAwareCoder in the PR adds another buffer copy.. (BufferedElementCountingOutputStream already has extra buffer copy). On Mon, Feb 5, 2018 at 10:34 AM, Romain Manni-Bucau wrote: > Would this

Re: A 15x speed-up in local Python DirectRunner execution

2018-02-08 Thread Raghu Angadi
This is terrific news! Thanks Charles. On Wed, Feb 7, 2018 at 5:55 PM, Charles Chen wrote: > Local execution of Beam pipelines on the Python DirectRunner currently > suffers from performance issues, which makes it hard for pipeline authors > to iterate, especially on medium to

Re: [DISCUSS] Versioning, Hadoop related dependencies and enterprise users

2018-08-28 Thread Raghu Angadi
Thanks for the IO versioning summary. KafkaIO's policy of 'let the user decide exact version at runtime' has been quite useful so far. How feasible is that for other connectors? Also, KafkaIO does not limit itself to minimum features available across all the supported versions. Some of the

Re: SQS source

2018-07-19 Thread Raghu Angadi
A timestamp for a message is fundamental to an element in a PCollection. What do you mean by not knowing timestamp of a message? There is finalizeCheckpoint API[1] in UnboundedSource. Does that help? PubSub is also very similar, a message need to be acked with in a timeout, otherwise it will be

Re: SQS source

2018-07-23 Thread Raghu Angadi
t method get called, and hence, my messages keep getting > redelivered. > Are you testing with direct runner? It should be called after first stage processes (i.e. the checkpoint mark is durably committed by the runner). Raghu. > > > On Thu, Jul 19, 2018 at 5:26 PM, Raghu Anga

Re: "Radically modular data ingestion APIs in Apache Beam" @ Strata - slides available

2018-03-08 Thread Raghu Angadi
Terrific! Thanks Eugene. Just the slides themselves are so good, can't wait for the video. Do you know when the video might be available? On Thu, Mar 8, 2018 at 12:16 PM Eugene Kirpichov wrote: > Oops that's just the template I used. Thanks for noticing, will regenerate >

Re: Running Nexmark with PubSub issue

2018-04-11 Thread Raghu Angadi
I noticed it too while adding KafkaIO support for Nexmark (this was in parallel to another PR for KafkaIO that got merged recently). The anonymous inner class for DoFn is not serializable. I moved it to a static class in my branch, but didn't test it yet :

Re: Running Nexmark with PubSub issue

2018-04-13 Thread Raghu Angadi
t;k...@google.com> wrote: > > Yea, this is a common issue with serializable anonymous inner classes in > general. It would be nice if Beam Java could have an overarching solution > to limiting the closure to things actually touched. > > Kenn > > On Wed, Apr 11, 2018 at 10:30 AM Raghu A

Re: About the Gauge metric API

2018-04-06 Thread Raghu Angadi
> > I would be in favor of replacing the existing Gauge.set(long) API with the > String version and removing the old one. This would be a breaking change. > However this is a relatively new API and is still marked @Experimental. > Keeping the old API would retain the potential confusion. It's

Re: About the Gauge metric API

2018-04-06 Thread Raghu Angadi
ibly some short migration period) worthwhile. On another note, I still > find the distributed gague API to be a bit odd in general. > > On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi <rang...@google.com> wrote: > >> I would be in favor of replacing the existing Gauge.set(long) API w

Re: When will the Dataflow Runner invoke the UnboundedReader.close() method and decode the checkpoint?

2018-10-18 Thread Raghu Angadi
Decoding checkpoint is required only while resuming a reader. Typically this happens: while reopening the reader after it is closed (for any reason), or while restarting the pipeline with previous checkpoint, as in a Dataflow update, or when the work moves to a different worker, or if the worker

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
id, >> >> Just apply a regular ParDo and return a PCollectionTuple afert that you >> can extract your Success Records (TupleTag) and your DeadLetter >> records(TupleTag) and do whatever you want with them. >> >> >> Raghu Angadi schrieb am Mi.,

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
ser is attempting to handle errors when parsing the > timestamp. The timestamp controls the watermark for the UnboundedSource, > how would they control the watermark in a downstream ParDo? > > On Wed, Oct 24, 2018 at 9:48 AM Raghu Angadi wrote: > >> On Wed, Oct 24, 2018 at 7:19 AM Cham

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
To be clear, returning min_timestamp for unparsable records shound not affect the watermark. On Wed, Oct 24, 2018 at 10:32 AM Raghu Angadi wrote: > How about returning min_timestamp? The would be dropped or redirected by > the ParDo after that. > Btw, TimestampPolicyFactory.withTi

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
that doesn’t have one? > On Wed, Oct 24, 2018 at 12:25 PM Raghu Angadi wrote: > >> To be clear, returning min_timestamp for unparsable records shound not >> affect the watermark. >> >> On Wed, Oct 24, 2018 at 10:32 AM Raghu Angadi wrote: >> >>>

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/BoundedWindow.java#L48 > > > On Wed, Oct 24, 2018 at 12:51 PM Raghu Angadi wrote: > >> >> >> On Wed, Oct 24, 2018 at 12:33 PM Lukasz Cwik wro

Re: KafkaIO - Deadletter output

2018-10-24 Thread Raghu Angadi
sdk.values.TupleTag-T-org.joda.time.Instant- > > On Wed, Oct 24, 2018 at 1:21 PM Raghu Angadi wrote: > >> Thanks. So returning min timestamp is OK, right (assuming application >> fine is with what it means)? >> >> On Wed, Oct 24, 2018 at 1:17 PM Lukasz Cwik wro

Re: KafkaIO - Deadletter output

2018-10-25 Thread Raghu Angadi
On Thu, Oct 25, 2018 at 10:47 AM Chamikara Jayalath wrote: > > > On Thu, Oct 25, 2018 at 10:41 AM Raghu Angadi wrote: > >> >> On Thu, Oct 25, 2018 at 10:28 AM Chamikara Jayalath >> wrote: >> >>> Not sure if I understand why this would require K

Re: KafkaIO - Deadletter output

2018-10-25 Thread Raghu Angadi
ue. I think this same level of support >>> is also required for records that cannot have timestamps extracted in an >>> unbounded source. >>> >>> In an SDF I think the function has enough control to do it all in >>> "userland", so Cham is

Re: KafkaIO - Deadletter output

2018-10-23 Thread Raghu Angadi
User can read serialized bytes from KafkaIO and deserialize explicitly in a ParDo, which gives complete control on how to handle record errors. This is I would do if I need to in my pipeline. If there is a transform in Beam that does this, it could be convenient for users in many such scenarios.

Re: Is Splittable DoFn suitable for fetch data from a socket server?

2018-09-21 Thread Raghu Angadi
> This in-house built socket server could accept multiple clients, but only send messages to the first-connected client, and will send messages to the second client if the first one disconnected. Server sending messages to first client connection only is quite critical. Even if you use Source API

Re: [ANNOUNCEMENT] New Beam chair: Kenneth Knowles

2018-09-20 Thread Raghu Angadi
Congrats Kenn! On Wed, Sep 19, 2018 at 12:54 PM Davor Bonaci wrote: > Hi everyone -- > It is with great pleasure that I announce that at today's meeting of the > Foundation's Board of Directors, the Board has appointed Kenneth Knowles as > the second chair of the Apache Beam project. > > Kenn

Dev contact for KafkaIO

2019-01-09 Thread Raghu Angadi
Hi Everyone, Last Friday was my last day at Google and Dataflow team. I had a great time and learnt lot from working on both Apache Beam and Dataflow. My new job at a startup is not directly related to Apache Beam and I will not be able to spend a lot of time on support or development of

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-14 Thread Raghu Angadi
I would like propose one more cherrypick for RC2 : https://github.com/apache/beam/pull/6391 This is a KafkaIO bug fix. Once a user hits this bug, there is no easy work around for them, especially on Dataflow. Only work around in Dataflow is to restart or reload the job. The fix itself fairly safe

Re: [VOTE] Release 2.7.0, release candidate #1

2018-09-17 Thread Raghu Angadi
cychen)? > > Romain, JB: is there any way I can help with debugging the issue you're > facing so we can unblock the release? > > On Fri, Sep 14, 2018 at 1:49 PM Raghu Angadi wrote: > >> I would like propose one more cherrypick for RC2 : >> https://github.com/apache/bea

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Raghu Angadi
gt; support them more obvious? (E.g., "this PR is blocked because someone at >>>>> Google on Dataflow team has to fix something" vs "this PR is blocked >>>>> because the Apache Beam code in foo/bar/baz is failing, and anyone who can >>>&g

Re: Donating the Dataflow Worker code to Apache Beam

2018-09-13 Thread Raghu Angadi
This is terrific! Is thread asking for opinions from the community about if it should be accepted? Assuming Google side decision is made to contribute, big +1 from me to include it next to other runners. On Thu, Sep 13, 2018 at 10:38 AM Lukasz Cwik wrote: > At Google we have been importing the

Re: kafka 0.9 support

2019-04-02 Thread Raghu Angadi
Thanks to David Morávek for pointing out possible improvement to KafkaIO for dropping support for 0.9 since it avoids having a second consumer just to fetch latest offsets for backlog. Ideally we should be dropping 0.9 support for next major release, in fact better to drop versions before 0.10.1

Re: kafka 0.9 support

2019-04-03 Thread Raghu Angadi
stin Bennett < > whatwouldausti...@gmail.com> wrote: > >> I withdraw my concern -- checked on info on the cluster I will eventually >> access. It is on 0.8, so I was speaking too soon. Can't speak to rest of >> user base. >> >> On Tue, Apr 2, 2019 at 11:03

Re: kafka 0.9 support

2019-04-03 Thread Raghu Angadi
I mean, +1 for removing support for old Kafka versions after next LTS What the cut off should be for 'old' version is can be discussed then. My choice would be 0.11. Raghu. On Wed, Apr 3, 2019 at 4:36 PM Raghu Angadi wrote: > +1 for next LTS. > > On Wed, Apr 3, 2019 at 2:30 PM Ism

Re: KafkaIO Exactly-Once & Flink Runner

2019-02-28 Thread Raghu Angadi
> Now why does the Flink Runner not support KafkaIO EOS? Flink's native > KafkaProducer supports exactly-once. It simply commits the pending > transaction once it has completed a checkpoint. On Thu, Feb 28, 2019 at 9:59 AM Maximilian Michels wrote: > Hi, > > I came across KafkaIO's Runner

Re: KafkaIO Exactly-Once & Flink Runner

2019-02-28 Thread Raghu Angadi
https://github.com/apache/beam/pull/7955 > > On Thu, Feb 28, 2019 at 10:43 AM Reuven Lax wrote: > >> >> >> On Thu, Feb 28, 2019 at 10:41 AM Raghu Angadi wrote: >> >>> >>> Now why does the Flink Runner not support KafkaIO EOS? Flink's na

Re: KafkaIO Exactly-Once & Flink Runner

2019-02-28 Thread Raghu Angadi
lly results in job failure. > Kenn > > On Thu, Feb 28, 2019 at 1:25 PM Raghu Angadi wrote: > >> On Thu, Feb 28, 2019 at 11:01 AM Kenneth Knowles wrote: >> >>> I believe the way you would implement the logic behind Flink's >>> KafkaProducer wo

Re: KafkaIO Exactly-Once & Flink Runner

2019-02-28 Thread Raghu Angadi
On Thu, Feb 28, 2019 at 2:42 PM Raghu Angadi wrote: > On Thu, Feb 28, 2019 at 2:34 PM Kenneth Knowles wrote: > >> I'm not sure what a hard fail is. I probably have a shallow >> understanding, but doesn't @RequiresStableInput work for 2PC? The >> preCommit(

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-01 Thread Raghu Angadi
On Thu, Feb 28, 2019 at 2:42 PM Raghu Angadi wrote: > On Thu, Feb 28, 2019 at 2:34 PM Kenneth Knowles wrote: > >> I'm not sure what a hard fail is. I probably have a shallow >> understanding, but doesn't @RequiresStableInput work for 2PC? The >> preCommit(

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-01 Thread Raghu Angadi
t;> > > >> SDF has a checkpoint method which the Runner can call, but I > think > > >> that you are right, that the above problem would be the same. > > >> > > >>> Absolutely. I would love to support EOS in KakaIO fo

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-05 Thread Raghu Angadi
t;> transactions. Imagine: >> > > > >> >> > > > >> 1) ExactlyOnceWriter writes records to Kafka and >> > commits, >> > >

Re: KafkaIO Exactly-Once & Flink Runner

2019-03-06 Thread Raghu Angadi
On Wed, Mar 6, 2019 at 9:37 AM Kenneth Knowles wrote: > On Tue, Mar 5, 2019 at 10:02 PM Raghu Angadi wrote: > >> >> >> On Tue, Mar 5, 2019 at 7:46 AM Reuven Lax wrote: >> >>> RE: Kenn's suggestion. i think Raghu looked into something that, and >

Re: [ANNOUNCE] New committer announcement: Raghu Angadi

2019-03-11 Thread Raghu Angadi
;>>>> On Thu, Mar 7, 2019 at 7:09 PM Ahmet Altay >>>>>>>>> <mailto:al...@google.com>> wrote: > >>>>>>>>>> Congratulations! > >>>>>>>>>> > >>>>>>>>>> On Thu, Mar 7, 2019 at 10: